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Abstract 


Certified  code  is  native  machine  code  that  is  annotated  with  an  automatically  checkable  certificate 
attesting  to  the  conformance  of  the  program  to  a  specified  safety  policy.  Certified  code  frameworks 
have  been  built  based  on  first-order  logic  (PCC)  and  on  types  (TAL).  Compilers  generating  certified 
code  have  been  built  for  safe  subsets  of  C  and  for  Java(tm). 

Type-preserving  compilers  such  as  the  TILT /ML  compiler  implement  compilation  as  transfor¬ 
mations  on  typed  internal  languages.  Types  are  used  by  the  compiler  for  internal  verification,  and  for 
optimization  purposes.  Type  analysis  can  be  used  to  implement  optimizations  such  as  non-uniform 
data  representation  and  tag-free  garbage  collection.  However,  none  of  the  existing  type-preserving 
compilers  for  full-scale  languages  maintain  type  information  all  the  way  to  the  machine-code  level, 
and  hence  are  not  yet  able  to  generate  certified  code. 

In  this  thesis,  I  demonstrate  that  certified  compilation  is  possible  in  a  type  analysis  framework 
by  extending  the  TILT /ML  compiler  to  generate  certified  code  in  the  form  of  typed  assembly  lan¬ 
guage  without  compromising  the  existing  optimizations  of  the  compiler.  This  work  demonstrates 
that  a  eompiler  ean  use  types  to  generate  eertified  binaries  for  a  full  modern  language  even  in  the 
presenee  of  agressive  type-analysis  based  optimization. 
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Chapter  1 

Introduction 


1.1  Types  in  compilation 

It  is  an  unfortunate  fact  about  the  state  of  programming  that  even  good  programs  sometimes  go 
wrong.  It  is  not  uncommon  for  programs  to  crash  or  misbehave,  whether  accidentally,  or  with 
malicious  intent.  This  problem  has  been  greatly  compounded  in  recent  years  by  the  proliferation 
of  mobile  code.  More  and  more  of  the  code  that  is  run  is  downloaded  in  bits  and  pieces  from 
various  sources.  Examples  of  this  include  Java-script  and  Java(TM)  programs  downloaded  into  a 
web  browser,  applications  downloaded  directly  over  the  internet,  and  code  run  on  behalf  of  others 
(such  as  the  SETI@HOME  project,  which  uses  donated  spare  processor  cycles  to  further  the  search 
for  extra-terrestrial  life) .  The  proliferation  of  mobile  code  is  expected  only  to  increase  as  networked 
technology  becomes  more  a  part  of  everyday  life. 

However,  it  is  particularly  hard  to  trust  the  behavior  of  this  sort  of  code.  The  code  producer 
may  be  unknown  to  the  code  consumer,  or  the  identity  of  the  producer  may  be  spoofed.  Moreover, 
even  trusted  producers  occasionally  produce  programs  that  go  wrong. 

It  would  certainly  seem  desirable  to  be  able  to  rule  out  programs  that  are  unsafe.  Unfortunately, 
determining  the  safety  of  arbitrary  machine  code  is  undecidable.  A  compromise  solution  that  has 
been  in  existence  for  several  decades,  is  to  restrict  ourselves  to  programming  in  a  language  which 
has  the  property  that  all  programs  are  safe.  The  resulting  compiled  code  is  therefore  safe  by 
construction,  modulo  the  eorreetness  of  the  eompiler.  Languages  such  as  LISP,  ML  and  Java  all 
have  this  property:  that  all  programs  written  in  these  languages  are  guaranteed  with  some  level  of 
certainty  not  to  produce  undefined  and  unsafe  behavior. 

Unfortunately  for  the  purposes  of  mobile  code,  the  safety  properties  enjoyed  by  these  languages 
are  only  guaranteed  at  the  source  level:  once  the  source  code  has  been  compiled  to  machine  in¬ 
structions,  the  safety  guarantee  lies  only  in  the  implicit  property  of  being  in  the  image  of  the  safe 
language  under  compilation.  One  solution  to  this  is  to  ship  around  instead  something  tantamount 
to  source  code,  allowing  the  consumer  to  validate  the  code  independently.  This  is  in  essence  the 
Java(TM)  byte-code  solution.  This  places  much  of  the  burden  of  compilation  on  the  code  consumer, 
who  must  still  in  turn  trust  that  their  own  compiler  is  correct. 

The  two  most  commonly  used  solutions  then,  are  either  to  accept  arbitrary  machine  code  based 
on  trust  in  the  producer  of  the  code,  or  to  accept  only  annotated  high-level  code  and  to  trust  the 
local  compiler.  Both  of  these  solutions  leave  much  to  be  desired. 

To  better  this,  several  systems  have  been  proposed  for  annotating  machine  code  in  such  a  way 
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that  safety  remains  checkable.  Along  with  the  code,  the  code  consumer  receives  a  certificate  that 
can  be  used  to  check  that  that  the  code  conforms  to  the  correctness  assertions  that  it  claims.  The 
only  software  that  the  code  consumer  must  still  trust  is  the  checker  itself.  Two  notable  examples  of 
this  include  Proof  Carrying  Code  (PCC)  from  CMU  [Nec98]  and  Typed  Assembly  Language  (TAL) 
from  Cornell  [MWCG98].  The  PCC  system  provides  for  machine  code  to  be  annotated  with  proofs 
of  safety  properties  done  in  first-order  logic.  The  code  consumer  simply  checks  that  the  proofs  are 
indeed  correct  -  a  relatively  simple  procedure.  This  system  is  very  general,  in  that  it  can  be  used 
to  certify  any  property  which  can  be  expressed  in  the  logic.  TAL  on  the  other  hand  specializes  to 
the  particular  property  of  type  safety.  TAL  provides  for  a  machine  code  which  is  annotated  with 
types,  which  can  then  be  type  checked  by  the  code  consumer. 

A  certifying  compiler  is  one  which  produces,  along  with  its  normal  output,  a  certificate  which 
can  be  used  to  check  that  the  generated  code  is  safe  according  to  some  policy.  Certifying  compilers 
have  been  written  translating  safe  subsets  of  C  to  both  PCC  and  TAL  [MCG'''99,  NL98].  More 
ambitiously,  a  full  scale  Java(TM)  compiler  has  been  written  targeting  PCC  [CLN'*'00]. 

The  TILT  (TIL  Two)  compiler  is  an  optimizing  compiler  developed  at  CMU  that  implements 
the  full  Standard  ML  ’98  definition  and  includes  support  for  separate  compilation.  Important  ideas 
pioneered  in  TILT  and  its  predecessor  TIL  include  using  intensional  polymorphism  [HM95]  to 
reduce  the  cost  of  implementing  polymorphism  and  garbage  collection.  Compilation  proceeds  as 
a  series  of  typed  transformations  into  successively  lower  level  typed  languages.  Type  information 
is  used  to  allow  for  optimized  data  representations  and  to  do  “almost  tag-free”  garbage  collection. 
Prior  to  this  work  however,  type  information  was  mostly  erased  in  TILT  well  before  the  transfor¬ 
mation  to  machine  code  was  made,  and  hence  safety  properties  of  the  resulting  code  could  only  be 
asserted  -  not  checked. 

TILT  uses  types  during  compilation  for  optimization  purposes,  and  consequently  requires  an 
intermediate  language  with  a  very  expressive  type  theory.  Previous  work  on  typed  assembly  lan¬ 
guages  has  primarily  focused  on  preserving  type  information  for  certification  purposes.  I  claim  that 
these  two  uses  of  low  level  typed  languages  are  compatible.  A  compiler  can  use  types  to  generate 
certified  binaries  while  retaining  the  ability  to  perform  complicated  type  based  optimizations  on  a 
full  modern  language. 

I  have  demonstrated  this  by  extending  the  TILT-ML  compiler  to  maintain  type  information  used 
in  the  intermediate  passes  of  the  compiler  all  the  way  through  code  generation,  producing  certified 
binaries  without  sacrificing  the  ability  to  perform  type  analysis  optimizations.  This  dissertation 
gives  a  careful  theoretical  description  of  the  key  elements  of  the  compilation  process,  and  proves 
soundness  theorems  for  the  translations  between  the  major  intermediate  languages.  A  description 
is  also  given  of  the  actual  implementation,  including  some  empirical  results. 

It  is  becoming  increasingly  clear  that  type  preserving  compilation,  in  addition  to  serving  as  a 
mechanism  for  enabling  safe  mobile  code,  also  provides  great  benefits  in  its  own  right  as  a  compiler 
engineering  technique.  Just  as  type  safe  languages  allow  programmers  to  write  correct  code  more 
quickly,  type  preserving  compilation  allows  compiler  implementers  to  write  correct  compilers  more 
quickly.  Type  checking  the  intermediate  representations  of  programs  within  the  compiler  allows 
many  or  most  compiler  bugs  to  be  caught  during  compilation  and  to  be  localized  to  the  particular 
point  of  failure  in  the  compiler.  Whole  classes  of  pernicious  compiler  bugs  (such  as  those  that 
involve  stack  or  memory  corruption)  can  be  eliminated  by  producing  checkably  type  safe  code. 
One  of  the  results  of  this  thesis  is  to  provide  further  evidence  of  the  efficacy  of  this  technique,  and 
to  add  to  the  body  of  experience  in  the  engineering  of  type  preserving  compilers.  This  aspect  of 
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ML  Source 


Figure  1.1:  TILT  Architecture 


the  implementation  experience  is  discussed  further  in  section  10.1.4. 


1.2  The  non-certifying  TILT  compiler 

The  past  years  have  seen  a  great  deal  of  interest  in  the  idea  of  “typed  compilers”  that  maintain 
type  information  deep  into  the  compilation  process.  Such  type  information  can  be  exploited  by 
the  compiler  internally  to  allow  for  optimized  data  representations  and  to  do  tag-free  garbage 
collection,  as  well  as  providing  the  compiler  with  a  basis  for  internal  correctness  checks.  This 
work  was  pioneered  in  the  TIL  compiler  at  CMU  [TMC^96],  and  has  been  adopted  by  numerous 
other  compilers,  including  the  Glasgow  Haskell  Compiler.  Other  recent  work  has  also  suggested  the 
possibility  of  maintaining  type  information  through  to  the  machine  code  as  a  form  of  certification 
[MWCG97]. 

The  TIL  compiler  clearly  demonstrated  that  typed  compilation  was  both  feasible  and  desirable. 
However,  TIL  compiled  only  the  core  language  of  Standard  ML:  the  powerful  modular  features 
that  are  one  of  the  most  important  elements  of  SML  were  not  dealt  with.  The  TIL  Two  (TILT) 
compiler  was  aimed  at  addressing  this  shortcoming. 

Figure  1.1  depicts  the  structure  of  the  non-certifying  TILT  compiler.  Its  architecture  is  based 
around  two  typed  intermediate  languages.  The  initial  elaboration  from  SML  source  targets  a 
structures  calculus  called  the  HIL  (High  Intermediate  Language).  This  language  is  relatively  close 
to  SML,  and  among  other  things  provides  the  interface  language  used  for  separate  compilation. 
After  elaboration  (and  hence  typechecking),  the  HIL  is  translated  to  a  second  typed  language 
called  the  MIL  (Middle  Intermediate  Language)  through  a  process  called  phase  splitting  [HMM90] . 
The  phase  splitting  process  maps  each  SML  structure  into  separate  type  and  term  level  records, 
representing  the  static  and  dynamic  portions  of  the  structure.  Similarly,  SML  functors  are  mapped 
to  type  and  term  level  functions.  In  this  fashion,  modular  programs  are  translated  into  programs 
containing  only  lambda  calculus  terms. 

The  MIL  is  the  language  in  which  almost  all  of  the  optimization  passes  implemented  in  TILT 
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are  done.  This  constrains  the  design  of  the  MIL,  since  it  must  be  possible  to  express  the  results  of 
all  of  the  desired  optimizations  in  a  typed  fashion.  In  particular,  it  is  important  that  primitives  for 
data  representation  optimizations  be  present  at  this  level.  By  “hiding”  type  analysis  inside  of  a  few 
primitives,  the  MIL  avoids  the  need  for  a  general  typecase  construct  as  used  in  the  calculus. 
Nonetheless,  the  fact  that  some  MIL  primitives  do  indeed  analyze  their  types  mandates  a  type 
passing  interpretation  for  the  MIL  operational  semantics. 

All  of  the  intermediate  languages  of  the  TILT  compiler  up  to  and  including  the  MIL  are 
typed,  and  all  of  the  compiler  passes  on  these  languages  are  type-preserving  in  the  sense  that 
they  map  well-typed  programs  to  well-typed  programs.  Unfortunately  for  certification  purposes, 
the  subsequent  languages  from  the  MIL  on  down  are  not  typed,  and  hence  the  generated  code 
cannot  in  general  be  proven  safe.  This  dissertation  replaces  this  un-typed  backend  with  a  new 
type-preserving  backend  that  produces  certified  code. 


1.3  The  certifying  TILT  compiler 

One  of  the  major  goals  of  certifying  compilation  is  to  ensure  that  the  certifying  compiler  is  type¬ 
preserving:  that  is,  that  it  maps  well- typed  programs  in  the  source  language  to  well- typed  programs 
in  the  target  language.  In  order  to  show  that  this  is  the  case,  I  spend  the  first  part  of  this  dissertation 
presenting  idealized  versions  of  the  compiler  intermediate  languages,  and  proving  the  soundness  of 
the  translations  between  them.  The  next  two  sections  describe  the  idealized  compiler  and  its 
relation  to  the  implementation. 

1.3.1  The  theoretical  compiler 

The  theoretical  portion  of  this  dissertation  describes  the  framework  for  a  translation  mapping  the 
original  TILT  internal  language  (the  MIL,  described  in  chapter  2)  down  to  a  typed  assembly 
language  that  I  call  TILTAL  (Typed  Assembly  Language  for  TILT).  This  translation  uses  as 
an  intermediate  stage  a  new  internal  language  called  the  LIL  (Low  Level  Language).  The  LIL  is 
an  impredicatively  typed  lambda  calculus  based  on  Crary  and  Weirich’s  LX  [CW99].  Figure  1.2 
describes  the  structure  of  the  theoretical  compiler. 

The  LIL,  described  in  chapter  3,  provides  a  very  rich  type  system  in  which  the  type  analysis  of 
TILT  can  be  represented  using  term  level  constructs.  In  addition  to  various  engineering  benefits, 
this  fact  allows  us  to  take  a  type  erasure  interpretation,  instead  of  the  type  passing  interpretation 
apparently  mandated  by  the  MIL  primitives.  The  fact  that  we  can  embed  type  analysis  into  the 
term  level  reflects  in  a  typed  fashion  exactly  the  techniques  already  used  in  an  untyped  fashion 
for  implementing  type  passing  languages.  Chapter  4  gives  a  brief  introduction  to  the  type  analysis 
methodology  of  the  LIL  via  a  worked  example. 

The  translation  from  MIL  to  LIL  serves  to  make  type  analysis  and  type  representations  explicit 
in  the  term  language.  This  translation  is  described  in  detail  in  chapter  5.  A  subsequent  pass 
mapping  LIL  terms  to  LIL  terms  performs  closure  conversion.  The  closure  conversion  translation 
is  described  in  chapter  6. 

Finally,  I  give  a  translation  from  the  LIL  into  TILTAL.  TILTAL  code  is  essentially  machine 
code  with  type  annotations  added  allowing  it  to  be  typechecked.  Currently,  there  are  in  addition  to 
the  standard  assembly  language  instructions  several  typed  primitives  corresponding  to  assembler 
macros.  These  primitives  handle  memory  allocation  (and  hence  the  interaction  with  the  garbage 
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MIL  (Typed) 


TALT  (Typed) 

Figure  1.2:  Structure  of  the  theoretical  compiler 


collector)  and  array  bounds  checking.  The  TILTAL  language  is  introduced  in  chapter  7,  and  the 
translation  from  LIL  programs  to  TILTAL  programs  is  given  in  chapter  8. 

The  theoretical  LIL  is  actually  very  close  to  the  LIL  as  implemented  in  the  compiler,  mostly 
lacking  only  an  extended  set  of  basic  primitive  operations.  The  presentation  of  the  TILTAL  target 
language  is  intended  to  suggest  how  the  ideas  used  to  implement  type  analysis  in  the  LIL  trans¬ 
late  down  to  the  assembly  code  level.  The  actual  TALx86  implementation  departs  significantly 
from  that  given  here.  However,  for  the  most  part  the  theoretical  translations  between  the  various 
languages  are  designed  to  capture  all  or  most  of  the  essential  details  of  the  implementation.  In 
particular,  the  LIL  to  TILTAL  translation  of  chapter  8  attempts  as  much  as  possible  to  account 
for  the  actual  code  generation  techniques  used  by  the  implementation. 

The  main  result  of  the  theoretical  portion  of  this  dissertation  is  a  proof  that  the  compilation 
of  LIL  programs  into  TILTAL  programs  preserves  types,  in  the  sense  that  each  of  the  individual 
translations  is  proved  to  map  well-typed  terms  to  well- typed  terms.  Proofs  of  soundness  for  the 
various  translation  phases  are  given  in  the  chapters  in  which  they  are  defined. 


1.3.2  Compiler  implementation 

The  second  half  of  this  thesis  is  an  actual  implementation  of  a  certifying  compiler  for  Standard 
ML.  The  theoretical  compiler  discussed  in  the  previous  section  is  intended  as  a  model  for  the 
implementation.  Where  the  original  architecture  (see  figure  1.1)  switches  to  an  untyped  language, 
the  new  implementation  must  instead  take  the  typed  output  and  continue  the  compilation  process 
in  a  typed  setting,  as  shown  in  figure  1.3.  The  certifying  TILT  implementation  is  discussed  in 
chapter  9. 
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Figure  1.3:  Structure  of  the  certifying  TILT  backend 
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The  implementation  of  certifying  TILT  required  the  addition  of  essentially  five  additional  com¬ 
piler  stages. 

•  MIL  singleton  elimination 

•  Translation  to  LIL 

•  Closure  conversion 

•  Optimization 

•  Translation  to  TALx86 

To  simplify  the  meta-theory  of  the  LIL,  I  have  chosen  not  to  support  the  singleton  kinds  of 
the  original  MIL  as  implemented  in  the  compiler.  Therefore,  before  (or  concurrent  with)  the 
translation  to  LIL,  singletons  must  be  eliminated  from  MIL  code.  A  proof  that  this  is  possible, 
and  a  simple  algorithm  for  doing  so  has  been  given  by  Crary  [CraOO] .  The  implementation  of  this 
algorithm  and  its  effects  on  compiled  code  are  discussed  in  section  9.1. 

The  translation  to  LIL  is  described  in  detail  in  chapter  5.  The  most  interesting  part  of  this 
translation  is  the  process  of  making  type  analysis  explicit.  The  MIL  has  no  general  notion  of 
type  analysis,  instead  using  type  analyzing  primitives  to  implement  specific  optimizations  such  as 
floating  point  array  unboxing  and  flattening  function  arguments  into  records.  The  translation  to 
LIL  compiles  these  primitives  into  uses  of  a  general  type  analysis  mechanism. 

The  MIL  implements  all  of  the  optimizations  that  TILT  supports,  so  extended  optimization 
of  LIL  code  is  mostly  unnecessary.  However,  the  translations  tend  to  produce  code  that  can 
be  improved  significantly  by  a  simple  optimizer:  for  example,  the  closure  converter  frequently 
introduces  dead  code  and  projections  from  known  records.  Rather  than  making  other  phases  do 
significantly  more  work  to  avoid  this,  it  was  simpler  and  cleaner  to  implement  a  basic  optimization 
pass  for  the  LIL.  Note  that  the  optimizer  does  not  have  a  counterpart  in  the  theoretical  model. 

It  would  be  appealing  to  rely  as  well  on  the  MIL  closure  converter,  and  only  translate  closure 
converted  code.  Unfortunately,  the  MIL  notion  of  closure  conversion  is  not  easily  compatible  with 
the  LIL  notion,  since  the  MIL  must  closure-convert  types  as  data.  Translating  closure  converted 
MIL  code  would  require  “de-closure  converting”  types,  which  is  not  appealing.  For  this  reason, 
a  separate  closure  conversion  pass  for  the  LIL  was  implemented,  closely  modeled  on  the  formal 
closure  conversion  translation  from  chapter  6. 

In  order  to  allow  intermediate  code  to  be  validated,  it  was  also  useful  to  implement  a  type  checker 
for  the  LIL  language.  This  allows  the  output  of  the  various  phases  of  the  compiler  to  be  checked 
for  errors  internally.  The  ability  to  check  type  correctness  of  internal  program  representations  has 
proved  valuable  in  the  development  of  TILT. 

Lastly,  a  translation  is  given  mapping  LIL  programs  into  TALx86  programs.  Note  that  here, 
the  implementation  diverges  significantly  from  the  formal  model  described  in  chapter  8  in  that 
the  TILTAL  and  TALx86  languages  are  quite  different.  However,  the  theoretical  model  was 
carefully  designed  to  capture  many  of  the  interesting  algorithmic  approaches  used  in  the  actual 
implement  ation . 

The  final  portion  of  the  dissertation  gives  a  basic  quantitative  evaluation  of  the  implementation, 
including  measurements  of  the  size  of  the  generated  code  and  certificates  to  get  a  feeling  for  the 
overhead  of  certification.  Some  measurements  of  run  time  behavior  of  the  compiled  programs  are 
also  given. 
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It  is  important  to  state  here  that  while  I  measure  these  quantities  to  gain  some  understanding 
of  the  cost  of  my  approach,  optimizing  for  small  certificates  and  fast  certification  time  is  beyond  the 
scope  of  this  dissertation.  The  purpose  of  this  work  is  to  demonstrate  the  feasibility  of  using  types 
to  certify  type  analyzing  code  generated  from  a  full-scale,  general  purpose  language.  Engineering 
the  representations,  while  certainly  important  for  making  the  compiler  practical,  was  not  a  primary 
goal. 

1.4  Overview 

In  the  next  chapter,  I  provide  a  brief  introduction  to  a  simplified  version  of  the  MIL  intermediate 
language  which  is  the  original  source  language  for  the  compilation  process  described  in  this  disser¬ 
tation.  Chapter  3  describes  the  LIL  language  which  serves  as  the  main  intermediate  language  of 
the  new  backend.  The  complete  static  semantics  of  the  MIL  and  the  LIL  are  given  in  appendixes 
A  and  B,  respectively.  Chapter  4  gives  an  introduction  to  the  style  of  type- analysis  used  in  the  LIL 
via  an  extended  example.  The  translation  from  MIL  to  LIL  is  described  in  detail  in  chapter  5,  and 
a  proof  of  soundness  is  given.  Closure  conversion  of  LIL  terms  is  described  and  proved  sound  in 
chapter  6.  Chapter  7  introduces  the  TILTAL  typed  assembly  language  which  serves  as  the  target 
of  the  theoretical  compiler.  A  translation  from  LIL  programs  to  TILTAL  programs  is  described 
and  proved  sound  in  chapter  8.  Finally,  the  implementation  and  its  relation  to  the  theoretical 
presentation  is  discussed  in  chapter  9,  and  some  empirical  results  about  the  implementation  are 
presented. 


Chapter  2 

MIL 


I  describe  the  MIL  here  only  briefly,  as  the  theoretical  and  practical  aspects  of  this  calculus  have 
been  discussed  in  detail  elsewhere  [PCHSOO,  SH99,  VDP+03].  For  the  purposes  of  the  translation, 
I  assume  that  singletons  have  already  been  eliminated  [CraOO],  although  in  practice  it  may  be 
desirable  to  do  this  concurrently  with  the  translation.  The  syntax  of  the  singleton  free  MIL  is 
given  in  figure  2.1.  The  MIL  is  a  predicative  lambda  calculus  based  on  Girard’s  extended  with 
primitives  for  type  analysis.  The  intention  is  that  these  primitives  are  definable  in  terms  of  the  \f^ 
calculus  of  Harper  and  Morrisett  [HM95].  In  the  untyped  TILT  back-ends,  these  primitives  are 
only  compiled  directly  after  moving  to  an  untyped  calculus.  The  complete  static  semantics  for  the 
MIL  are  given  in  appendix  A  and  are  fairly  straightforward.  The  remainder  of  this  section  gives 
a  high-level  overview  of  the  structure  of  the  language  and  describes  the  type-analysis  methodology 
it  employs. 


2.1  Kinds  and  constructors 

The  kind  structure  of  the  MIL  is  relatively  simple  in  the  absence  of  singletons,  with  function 
and  product  kinds  to  classify  constructor  functions  and  tuples,  and  the  base  kind  T32  to  classify 
constructors  which  classify  terms.  The  notation  T32  indicates  that  the  terms  classified  by  construc¬ 
tors  of  this  kind  are  intended  to  be  represented  by  a  32  bit  quantity  after  compilation.  I  use  the 
term  “constructor”  in  preference  to  the  term  “type”  when  refering  to  constructors  of  arbitrary  or 
unspecified  kind.  I  reserve  the  standard  terminology  “type”  for  constructors  of  kind  T32. 

Most  of  the  the  base  constructors  are  completely  standard,  with  the  exception  of  the  treatment 
of  sums  and  the  constructs  for  type  analysis.  The  sum  type  encodes  the  number  of  non  value¬ 
carrying  fields  directly,  instead  of  simply  using  unit  as  the  carried  value.  In  addition,  the  MIL  has 
a  known  sum  type,  corresponding  to  the  type  of  a  sum  for  which  the  branch  inhabited  is  known. 
A  special  projection  operation  projects  the  carried  value  out  of  a  known  sum.  This  allows  the  case 
construct  to  avoid  destructing  its  value  which  may  be  unnecessary  if  the  arm  doesn’t  actually  use 
the  carried  value. 

Type  analysis  is  present  at  the  type  level  in  the  form  of  the  Vararg  construct  and  implicitly 
in  the  Array  type.  The  Vararg  type  classifies  the  term-level  vararg  construct,  which  is  used  to 
implement  non-standard  calling  conventions  for  functions  which  take  tuples  as  arguments.  In  TILT 
tuple  arguments  to  functions  are  always  flattened  into  registers  if  the  number  of  fields  in  the  tuple  is 
small.  In  order  to  make  this  work  with  polymorphic  functions,  it  is  necessary  to  use  type  analysis. 
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K  ::  =  T32  I  ^  K2  I  Kl  X  K2 

c  ::  =  a  I  Int  |  Boxf  |  ^  c  |  ci  x  . . .  x 

I  fi{a,  /3).{ci,C2)  I  (ci,C2)  \ttic\tt2c\  X{a::K).c  \  C1C2 

I  Vararg^^^^^  |  Vararg^^^t^^  |  Sumi(^  |  Sum^(^ 

I  Array^  |  Farr  ay  |  Exn  |  Dyntag^ 

r  ::  =  T(c)  |  V[a:TK](r)(z)  — >  r  |  V[a:'!/i](r)(i)  — >*  r  |  ri  x  . . .  x 
fv  ::  =  tr  I  a:/ 

sr  ::=  x  |  i  |  rollcsr  |  unrollc  sr 

I 

opr  ::=  sr  |  vararg^^^^^  sv  \  onearg^^^^^ 

I  boxf  fv  I  unboxf  sv  \  fv 
I  proj*sr  I 
I  (sr)  I  select*  sr 

I  case(s?;)(xi.ei, . . . ,  Xn.e„)  |  handleT-(ei,  x.e2) 

I  s?;[^(sr)(/r)  I  inj_dyn^(s?;i,sr2) 

I  raise,-  sv  \  mkexntag^ 

I  exncase,-(s?;)(s?;i  Xi-Ci,  _  ^  62) 

I  subc(sri,  5^2)  I  f  sub(s?;i,  5x2) 

I  upd^(s?;i,sr2,st3)  |  fupd(s?;i,  s?;2,/r) 

I  array^(s?;i,  sr2)  I  f  array(s?;,/r) 

e  ::=  sr  |  let,- rec,- /[Q;:'TK](xrr)(x}-).e  ine 
I  let,-  X  =  opr  in  e  |  let,-  Xf  =  opr  in  e 

A  ::=  •  I  A,  a:K 

r  ::=  •  I  r,  x:t  \  F,  x^^ 

Figure  2.1:  MIL  syntax 
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The  Vararg  type  can  be  thought  of  as  simply  a  type  level  typecase  in  a  like  language  which 
branches  on  the  argument  type  of  a  function  type.  If  the  argument  type  is  a  small  tuple,  it  returns 
a  multi-argument  function  type  where  the  fields  of  the  tuple  have  been  flattened.  Otherwise,  it 
leaves  the  function  type  unchanged. 

The  Array  type  implicitly  distinguishes  between  arrays  of  boxed  floating  point  numbers  and 
other  arrays,  in  order  to  flatten  the  boxed  float  arrays.  This  is  discussed  in  more  detail  below.  The 
rest  of  the  constructor  level  is  a  standard  typed  lambda  calculus,  classified  by  the  function  and 
product  kinds. 

2.2  Proper  types 

Also  described  in  figure  2.1  is  the  syntax  for  the  type  level.  Unlike  the  constructor  level  which 
corresponds  to  the  notion  of  types  as  data,  the  type  level  in  a  predicative  system  corresponds  to  the 
notion  of  types  as  elassifiers.  The  constructor  level  is  included  into  the  type  level  via  an  explicit 
inclusion  T(c).  The  type  level  also  contains  classifiers  for  polymorphic  functions  and  tuples  of 
terms.  The  duplication  of  the  tuple  type  at  the  type  level  indicates  the  possibility  of  constructing 
pairs  containing  polymorphic  functions  which  is  not  provided  for  by  the  constructor  level. 

2.3  Terms 

For  the  most  part,  the  term-level  MIL  is  a  standard  lambda  calculus  with  a  few  non-standard 
primitives.  The  presentation  here  restricts  the  syntax  to  a  named  form  [FSDF93,  Mog89]  to  reflect 
the  form  used  internally  in  the  TILT  compiler.  Named  form  forces  incremental  calculations  to  be 
given  names  via  variable  binding  (hence  named  form)  which  is  important  for  various  compiler  passes 
and  code  generation.  Unlike  most  lambda  calculi,  the  MIL  also  includes  low  level  data  represen¬ 
tation  primitives  (such  as  float  boxing  and  unboxing  primitives).  This  allows  data  representation 
optimizations  to  be  expressed  at  the  level  of  the  MIL. 

A  key  optimization  that  TILT  implements  is  the  use  of  non-uniform  data  representation.  Many 
implementations  of  languages  with  polymorphism  require  that  all  values  fit  into  a  word.  In  par¬ 
ticular,  array  elements  must  always  be  word-sized,  which  means  that  arrays  of  64  bit  floats  (for 
example)  must  actually  be  arrays  of  pointers  to  floats.  This  is  unfortunate  because  of  the  wasted 
space,  the  extra  indirections  to  access  the  data,  and  because  of  the  consequent  loss  of  data  locality. 
TILT  avoids  this  by  incorporating  type  analysis  into  the  array  primitives.  By  passing  types  at 
runtime  and  allowing  code  to  dispatch  on  them,  unboxed  floating  point  arrays  can  be  used  with 
the  appropriate  subscript  stride  chosen  at  runtime. 

The  MIL  calculus  differs  from  the  A)“^calculus  of  [HM95]  in  that  it  does  not  contain  an  explicit 
type  analysis  construct  such  as  typerec  or  typecase.  This  does  not  mean  however  that  the  idea  of 
intensional  type  analysis  has  been  abandoned:  rather,  the  type  analysis  has  been  hidden  inside  the 
primitives  that  need  to  use  it. 

For  example,  TILT  deals  with  floating  point  numbers  by  syntactically  distinguishing  between 
boxed  and  unboxed  floats,  with  appropriate  term  level  conversions  between  them.  This  allows 
the  optimizer  to  deal  directly  with  data  representation  optimizations,  even  at  the  relatively  high 
level  of  the  MIL.  The  syntactic  restriction  on  unboxed  floats  prevents  polymorphic  functions  from 
being  instantiated  with  the  unboxed  float  type,  so  that  all  polymorphic  values  take  up  32  bits. 
All  unboxed  floating  point  arguments  are  segregated  so  that  they  may  be  passed  in  float  registers. 


11 


A  special  Farray  type  is  provided  corresponding  to  the  type  of  arrays  of  unboxed  floats,  with 
corresponding  term-level  operations  farray,  fsub  and  fupd. 

One  obvious  problem  with  this  is  that  arrays  of  values  whose  type  is  not  statically  known 
to  be  Float  would  seem  to  have  to  use  a  boxed  representation.  By  using  type  analysis  in  the 
array  constructor  as  well  as  the  subscript  operator  however,  at  least  some  of  the  difficulty  can  be 
avoided.  Essentially,  the  Array  constructor  incorporates  a  typecase  which  selects  the  appropriate 
array  representation  based  on  the  type  of  the  carried  values.  Similarly,  the  term-level  subscript 
and  update  operations  must  dispatch  on  the  type  to  select  the  appropriate  operation.  In  the  case 
that  the  type  turns  out  to  be  Boxf ,  the  subscript  operation  will  also  be  forced  to  re-box  the  float 
before  returning  it,  since  subscripting  into  an  array  of  boxed  floats  returns  a  value  of  type  Boxf. 

Type  analysis  is  also  encoded  into  the  vararg  and  onearg  primitives  which  implement  special 
calling  conventions  for  single  argument  functions  whose  argument  type  is  a  small  record  type.  For 
example,  the  term  varargj,^^^^  corresponds  roughly  to  the  following  code  using  an  explicit 
style  typecase: 

typecase  ci  of 

Record  []  =>  lambda  []  .  sv  <> 

I  Record [t]  =>  lambda [x;t] .  sv  <x> 

I  Record[tl,t2]  =>  lambda [xl ; t 1, x2 ; t2] .  sv  <xl,x2> 

I  -  =>  f 


The  onearg  construct  is  the  inverse  of  vararg,  turning  a  variable  argument  function  back  to  a 
normal  function.  The  maximum  number  of  fields  that  can  be  flattened  (here  shown  as  two)  is  a 
machine  dependent  parameter  of  the  type  theory. 

The  MIL  hides  type  analysis  by  replacing  certain  stylized  uses  of  typecase  with  primitives  that 
analyze  their  type  arguments.  At  some  point  however,  it  becomes  necessary  to  make  this  analysis 
explicit.  Currently  in  the  compiler,  this  happens  when  the  MIL  is  translated  to  a  low-level  untyped 
language.  One  of  the  major  challenges  in  pushing  type  information  down  to  the  machine  level  in 
TILT  is  making  this  analysis  explicit  in  a  typed  language  that  is  amenable  to  translation  to  a 
typed  assembly  language.  The  next  chapter  discusses  a  strategy  for  doing  this  by  reflecting  type 
analysis  into  the  term  level  of  a  more  powerful  lambda  calculus. 
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Chapter  3 


LIL 


There  is  an  apparent  disparity  between  the  type-passing  model  of  type  analysis  and  the  constructs 
available  at  the  machine  level.  Evaluation  of  terms  depends  on  the  type  arguments  as  well  as  the 
term  arguments,  but  real  machine  processors  are  untyped.  The  ad-hoc  solution  originally  used  in 
TILT  is  to  translate  to  an  untyped  setting,  choosing  term-level  representatives  for  the  type  data  in 
the  process.  Type  analysis  then  becomes  simple  branches  on  the  values  chosen  to  represent  various 
types.  This  untyped  approach  was  taken  because  of  the  absence  of  a  well-understood  type  system 
for  typing  such  code. 

In  order  to  make  this  ad-hoc  solution  viable  for  producing  typed  certified  code,  Crary,  Weirich, 
and  Morrisett  describe  a  type  theory  that  permits  term-level  representation  of  types  and  type 
analysis  in  a  typed  setting  [CWM98]  using  primitive  terms  to  represent  types.  Crary  and  Weirich 
subsequently  extend  this  notion  [CW99]  to  a  type  theory  in  which  representation  of  types  is  defin¬ 
able  using  only  the  ordinary  lambda  calculus  term  constructs  with  an  enriched  type  system.  The 
LIL  adopts  this  idea  for  its  treatment  of  type  analysis,  and  also  extends  the  MIL  with  constructs 
for  representing  closures. 


3.1  The  LIL  syntax  and  static  semantics 

Figure  3.1  describes  the  complete  syntax  of  the  LIL.  As  with  the  MIL,  the  language  is  syntactically 
restricted  to  named  form.  This  is  not  particularly  necessary  for  theoretical  purposes,  but  matches 
more  closely  the  implementation. 

Kinds  form  the  top  of  the  syntactic  hierarchy,  and  are  generally  written  using  the  meta-variable 
K.  Kinds  classify  the  constructors,  for  which  the  met  a- variables  c  is  generally  used.  In  addition 
the  meta- variables  r  and  0  are  used  to  distinguish  constructors  of  kind  T32,  T64  respectively. 
This  is  purely  a  presentational  distinction  however  and  does  not  correspond  to  an  actual  syntactic 
distinction. 

At  the  term  level,  the  classes  of  small  32  bit  and  64  values  are  notated  as  sv  and  fv  respectively; 
32  bit  and  64  bit  operations  are  notated  as  opr  and  fopr  respectively;  and  expressions  are  notated 
using  the  meta-variable  e.  Note  that  the  distinction  between  32  and  64  bit  values  is  a  syntactic 
distinction  and  not  merely  a  notational  convention,  and  similarly  for  operations. 

LIL  programs  (p)  consist  of  a  mutually  recursive  set  (d)  of  heap  bindings  (hval),  and  an  exe¬ 
cutable  expression  e.  In  the  theoretical  presentation  here,  code  functions  are  the  only  heap  values 
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K  T32  I  T64  I  1  I  Ki  ^  K2  I  X  ^^2 

I  +[Ki,...,Kn]  I  j  I  M-K  I  Vj.K 

c,  r,  (f)  ::=  *  |  a  |  Xa:k.c  \  C1C2 
I  (ci,  C2)  I  VTi  C  I  7r2  c 
I  case(c,  [ai.ci, . . .  ,an-Cn]) 

I  fold^j.fcC  I  pr(j,Q!:/t,/):(j  ^  K'),i:  inc) 

I  Float  I  Int  I  Boxed  |  Void  |  x  |  ^  |  Code 
I  c[k\  I  Aj.c  I  3  I  V  I  Rec  |  \J 
I  Array32  |  Arrayg4  |  Dyntag  |  Dyn  |  Tag 

fv  ::=  X64  I  t 

sv  ::=  x\  (.\i 

I  inj_unioii^  St!  |  iiij_dyn^(s?;i,  s?;2) 

I  unroll^  sv  \  roll,-  sv 
I  packs?;  as  3a:K.r  hiding  c  |  s?;[c]  |  tagj 

fopr  ::=  fv  \  unbox  s?;  |  sub fj,{sv i,  sv 2) 

opr  ::=  sv  \  select*  s?; 

I  case(s?;)(x.ei, . . . ,  x.Cn) 

I  dyncase(s?;)(s?;i  ^  xi.ei,  -  ^  e) 

I  dyntag^  |  raisers?;  |  handleT-(ei, x.e2) 

I  boxfv  I  {s~v) 

I  sv{s~v){fv)  I  call  sv{s~v){fv) 

I  array^(s?;i,  s?;2)  |  arTay^{sv,fv)  \  subT-(s?;i,  s?;2) 

I  upd^(s?;i,  s?;2,/?;)  |  upd^(s?;i,  s?;2,  s?;3) 

e  ::=  let  rec^  /[ai::Ki, . . . ,  an::Kn]{xi:Ti, Zk-.4>k)-e  ine 

I  sv  I  let,-  X  =  opr  in  e  |  let  X64  =  fopr  in  e 
I  let  [a,  x]  =  unpack  s?;  in  e  |  let  (/3, 7)  =  c  in  e 
I  let  (fold/3)  =  cine 
I  let  (inj  j  (5)  =  (c,  s?;)  in  e 

hval  ::=  code^[ai::Ki, . . . ,  a„::Kn](a;i:Ti, . . . ,  Xm‘-Tm){zi-4i,  •  •  • ,  Zk-4k)-e 
d  ::=  e  |  d, hval 

p  ::=  letrec  dine 

::=  •  I 

A  ::=  •  I  A,  j  I  A,  a'.n 

r  ::=  •  I  r,x:r  |  T^x^i'.cj) 

Figure  3.1:  LIL  syntax 
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permitted:  however,  in  practice  additional  statically  allocated  data  is  placed  in  the  heap  section.^ 
The  next  several  sections  discuss  some  of  the  interesting  syntactic  aspects  of  the  LIL  and 
introduce  the  relevant  typing  judgements.  The  complete  static  semantics  for  the  LIL  can  be  found 
in  appendix  B.  Many  of  the  constructs  described  here  are  also  discussed  in  detail  by  Crary  and 
Weirich  [CW99]. 

3.1.1  Typing  contexts 


Context  judgements 

Heap  contexts 

h  'k  ok 

Constructor  contexts 

h  A  ok 

Term  contexts 

A  h  r  ok 

There  are  three  sorts  of  typing  contexts  for  the  LIL:  heap  contexts  di,  constructor  contexts 
A,  and  term  contexts  T.  Heap  contexts  associate  closed  types  with  labels  and  are  derived  from 
the  top  level  heap  of  a  program.  Constructor  contexts  actually  serve  two  purposes:  they  bind  free 
kind  variables,  and  they  bind  free  constructor  variables  at  kinds  (which  may  refer  to  previously 
bound  kind  variables).  In  principle  these  two  contexts  could  be  separated,  but  since  kind  variables 
are  uni- typed  it  seems  unnecessary.  Term  contexts  bind  32  and  64  bit  term  variables  at  types, 
which  may  refer  to  previously  bound  constructor  variables.  I  assume  that  variables  for  each  of 
the  different  syntactic  classes  (constructors,  kinds,  32  bit  terms  and  64  bit  terms)  are  drawn  from 
mutually  disjoint  infinite  sets. 

A  heap  context  is  well- formed  if  all  of  the  labels  in  its  domain  are  distinct,  and  if  each  of  the 
types  in  its  range  is  well- formed  in  an  empty  constructor  context.  This  latter  constraint  reflects 
the  fact  that  heap  values  are  bound  at  the  top-level  of  programs. 

A  constructor  context  is  well- formed  if  all  of  the  kind  variables  in  its  domain  are  distinct;  and  if 
all  of  the  constructor  variables  in  its  domain  are  distinct,  and  if  each  type  in  its  range  is  well- formed 
in  the  context  preceding  its  binding. 

A  term  context  is  well- formed  if  all  of  the  32  bit  variables  in  its  range  are  distinct,  each  type  at 
which  a  32  bit  variable  is  bound  is  well- formed  at  kind  T32;  and  if  all  of  the  64  bit  variables  in  its 
range  are  distinct,  and  if  each  type  at  which  a  64  bit  variable  is  bound  is  well- formed  at  kind  Te4. 

3.1.2  Constructors  and  Kinds 


Constructor  and  kind  judgements 

Kinds 

A  h  K  ok 

Constructors 

A  h  c :  K 

Equivalence 

A  h  c  =  c' :  K 

The  constructor  and  kind  typing  judgements  are  unsurprising.  The  judgement  A  h  k  ok  defines 
what  it  means  for  a  kind  n  to  be  well-formed  in  a  constructor  context  A.  The  judgement  A  h  c:  k 

^The  practice  of  referring  to  the  static  data  segment  of  LIL  programs  as  a  “heap”  reflects  the  standard  terminology 
of  the  literature  in  this  area:  however,  this  usage  is  slightly  misleading  since  in  practice  a  LIL  “heap”  is  understood  to 
correspond  more  closely  to  the  code  and  data  segments  of  an  executable  image,  rather  than  to  dynamically  allocated 
heap  space.  Nonetheless,  since  this  distinction  is  not  apparent  in  the  dynamic  semantics  of  TILTAL,  and  since  the 
usage  has  become  standard  in  the  literature,  I  will  continue  to  refer  to  this  portion  of  a  LIL  program  as  the  heap. 
Sorry  Bob! 
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Constants  and  their  kinds 

Float:T64  Int:T32  Void:T32 

Array32:T32  ^  T32 

Arrayg4:T64  ^  T32  Boxed  :T64  ^  T32 

Tag:nat^T32  Dyntag  :T32  ^  T32 

Dyn  :T32 

x:T32list  ^  T32 

^  :T32list  ^  Te4list  - 

T32  ^  T32 

V  :T32list  ^  T32 

Code  :T32list  ^  T64list 

^  T32  ^  T32 

V:Vi.(i  ^  T32) 

^T32  3:Vj.(j^T32)- 

T32 

Rec  :Vj.((j  - 

>  T32)  ^  (j  ^  T32))  j  ^ 

T32 

Figure  3.2:  Constructor  constants  and  their  kinds 


defines  what  it  means  for  a  constructor  c  to  be  well- formed  at  a  kind  /t  in  a  context  A.  And  the 
judgement  Ah  c=  c' :  k  defines  when  two  constructors  are  equivalent.  Note  that  the  kind  and  the 
context  in  the  equivalence  judgement  are  present  for  technical  reasons:  they  should  not  change  the 
set  of  equivalences  on  well-formed  constructors  (though  they  do  rule  out  equivalences  on  ill-formed 
constructors).  The  complete  definitions  of  these  judgements  are  given  in  appendix  B. 

The  most  important  change  from  the  MIL  is  the  enrichment  of  the  kind  structure:  most 
importantly,  the  addition  of  sum  kinds  with  corresponding  introduction  and  elimination  forms  at 
the  constructor  level.  Universal  kinds  (Vj.k)  and  inductive  kinds  {fij-n)  are  also  provided.  Kind 
variables  bound  by  inductive  kinds  are  restricted  to  occur  only  positively.  In  order  to  provide  for 
the  possibility  of  more  general  64  bit  types,  the  LIL  uses  an  explicit  kind  distinction,  providing 
kinds  T32  and  T64  corresponding  to  the  kind  of  the  types  of  32  and  64  bit  expressions  respectively. 

The  meta- variables  c,  r  and  cj)  are  used  to  represent  constructors  of  arbitrary  kind,  kind  T32 
and  kind  T64  respectively.  The  lambda  calculus  portion  of  the  constructor  level  contains  the  usual 
introduction  and  elimination  forms  for  sums,  pairs,  lambdas,  unit,  and  kind  abstraction.  These 
constructs  are  entirely  standard.  Inductive  kinds  are  introduced  with  a  fold  construct, 
which  injects  a  constructor  c  of  kind  K[fj,j.K/j]  into  the  kind  fij-K. 

The  elimination  form  for  inductive  kinds  is  one  of  the  most  complex  constructs  in  the  LIL 
and  requires  some  explanation.  The  essential  idea  is  to  provide  a  form  of  primitive  recursion  over 
inductive  kinds  at  the  constructor  level.  A  primitive  recursive  constructor  Tpr{j,  a:K,  p:{j  k').c) 
recursively  defines  a  function  from  pj.n  to  K'[pj.K/j]^  with  the  body  of  the  function  give  by  c. 
The  variable  a  is  the  argument  to  the  function,  and  stands  for  the  unfolded  argument  (nominally 
of  kind  K,[p,j.K/ j]).  However,  in  order  to  ensure  that  the  function  is  only  recursively  called  on  a 
sub-component  of  the  argument  (and  hence  is  guaranteed  to  terminate),  a  is  bound  at  kind  n  with 
occurrences  of  j  left  abstract,  and  the  recursive  variable  p  always  has  j  as  its  domain  kind. 

The  constructors  classifying  the  term  level  are  presented  in  the  LIL  as  constants  of  higher  kind. 
These  constructors,  given  in  figure  3.2,  include  constants  for  impredicative  universal  and  existential 
types,  general  parameterized  recursive  types,  arrays,  tagged  values,  sums,  integers,  floating  point 
numbers,  boxed  64  bit  values,  pairs,  and  functions.  Formulating  these  constructors  as  higher  order 
constants  makes  the  interaction  with  type  analysis  easier  to  deal  with.  The  kinds  of  several  of  the 
constants  refer  to  the  kind  of  lists  of  constructors.  This  is  defined  as  list)/?]  =  p,j' .1  +  k  x  f . 
Throughout  this  dissertation  I  will  frequently  use  the  usual  ML  list  syntax  for  constructor  lists. 
The  kind  nat  describing  the  encoding  of  natural  numbers  is  definable  directly  in  the  LIL  in  the 
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usual  fashion,  and  is  used  in  the  typing  rules  as  well.  When  n  is  a  natural  number,  I  write  n  for 
the  constructor  of  kind  nat  representing  n. 

For  the  most  part,  the  constructor  constants  are  fairly  straightforward.  For  example,  xc  cor¬ 
responds  to  the  type  of  a  tuple,  the  types  of  the  fields  of  which  are  given  by  the  elements  of  c.  If 
c  =  [ri,r2],  then  this  corresponds  to  n  x  r2  in  a  more  standard  notation.  Similarly,  the  arguments 
to  the  ^  constructor  correspond  to  the  types  of  the  32  bit  and  64  bit  arguments  of  the  function, 
and  its  return  type.  Hence,  ^([T])([(/)])(r)  corresponds  to  (r;  (p)  r. 

This  higher-order  abstract  syntax  methodology  is  very  convenient  for  both  the  theory  and  the 
implementation  but  can  interfere  with  readability.  For  example:  the  type  of  the  polymorphic 
pairing  function  written  in  this  style  appears  as 


V[T32](A(a:T32).V[T32](A(/3:T32).^[a][]H[6][](x[a,6])))) 

(Note  that  even  here  I  have  used  derived  notation  for  lists).  To  enhance  readability,  I  will  frequently 
make  use  of  more  standard  notation.  So  in  this  example,  I  would  write  the  type  as: 

V[a:T32,/3:T32].a  ^  (5  ^  {a  x  (3) 

This  practice  is  also  done  extensively  in  the  actual  implementation  via  libraries  which  provide 
defined  forms  implementing  the  more  standard  type  constructors  in  terms  of  the  HOAS  style 
constructors. 

The  Tag  constructor  classifies  sum  tags,  and  takes  as  its  argument  a  constructor  of  kind  nat 
indicating  which  branch  of  the  sum  the  value  inhabits. 

Sum  types  are  dealt  with  in  the  LIL  using  union  types.  In  principle  we  allow  injection  into 
any  union  type.  However,  the  case  construct  limits  its  arguments  to  have  union  types  composed 
solely  of  tags  or  tagged  records  for  which  the  tags  are  disjoint,  and  cover  the  full  range  of  tags  from 
the  zero  to  the  largest  tag.  So  for  example,  the  type  \/[Tag(0),  x[Tag(l),r]]  is  inhabited  by  terms 
which  are  either  tagg  or  a  pair  containing  a  r  in  its  second  field,  tagged  with  tagj^  in  the  first  field. 
This  union  type  therefore  describes  a  valid  argument  to  case.  In  point  of  fact,  the  tag  on  the  pair 
is  not  necessary  and  could  be  elided.  More  generally,  when  r  is  a  pointer,  then  both  the  tag  and 
the  pointer  indirection  can  be  eliminated.  Note  though  that  in  the  presence  of  unknown  types  such 
an  optimization  again  requires  type  analysis.  The  actual  implementation  of  sums  in  TILT  in  fact 
performs  this  optimization.  However,  since  it  adds  nothing  new  to  the  problem,  the  theoretical 
discussion  here  assumes  a  simple  treatment  of  sums  in  which  all  value-carrying  arms  are  tagged. 

The  universal  and  existential  constructors  take  as  arguments  a  kind  indicating  the  domain 
of  quantification  and  a  constructor  function  giving  the  body.  The  standard  V(q;:«:).c  becomes 
V[«:](A(a::/i).c),  and  similarly  for  the  existential.  Finally,  parameterized  recursive  types  are  defined 
with  the  Rec  constructor.  Rec[K](A(/9:K^T32).A(a:K).c)(c')  (where  d  has  kind  n)  corresponds  to  re¬ 
cursively  defining  a  constructor  of  kind  «:^T32  and  applying  it  to  c' .  Within  the  body  of  c,  p  stands 
for  the  recursively  defined  constructor,  and  ol  stands  for  the  parameter.  Therefore,  the  unfolding 
of  such  a  type  is  given  by  Cr(A(a:K).(Rec[K](cr)(a)))(c'),  where  =  {X{p:K  T32).A(a:K).c). 

In  order  to  allow  closure  conversion  to  be  done  within  the  LIL  language,  two  “function”  types 
are  given.  The  ^  primitive  type  constructor  when  applied  to  arguments  classifies  functions  in  the 
usual  sense.  The  Code  primitive  on  the  other  hand,  classifies  “code”  functions:  that  is,  code  that 
is  closed. 
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3.1.3 


Terms 


Term  judgements 

Small  values 

'k;  A;  T  h  St! :  r 

64  bit  values 

4/;  A;  Th /?;:</. 

32  bit  Operations 

'k;  A;  T  h  opr  :  r  oprgj 

64  bit  Operations 

I';  A;r  \-  fopr:  (j)  oprg^ 

Expressions 

It;  A;  T  h  e  :  r  exp 

Heap  values 

'k  h  /i :  r  hval 

Heaps 

'k  h  d  ok 

Programs 

'k  h  p :  r 

The  expression  level  of  the  LIL  is  divided  into  five  syntactic  classes:  64  bit  values  {fv),  32  bit 
values  (sv),  32  bit  operations  (opr),  64  bit  operations  (fopr)  and  expressions  (e).  Programs  are 
syntactically  restricted  to  a  named  form  where  all  intermediate  computations  are  named  in  the 
usual  fashion.  In  addition  to  the  five  classes  making  up  expressions,  there  are  an  additional  three 
syntactic  classes  making  up  full  LIL  programs:  heap  values  (hval),  heaps  (d),  and  programs  (p). 

The  well-formedness  judgements  for  the  five  expression  classes  are  all  of  the  same  essential 
form,  defining  what  it  means  for  a  term  to  be  well-formed  at  a  type  r  (or  0)  in  heap  context 
'k,  constructor  context  A,  and  term  context  T.  The  other  three  judgements  apply  to  heaps  and 
programs  which  do  not  have  free  variables  of  any  sort.  Therefore,  the  only  context  present  in  these 
judgements  is  the  heap  context,  which  describes  the  free  labels  of  a  heap  or  a  program.  As  usual, 
any  heap  value  may  refer  to  the  label  bound  to  any  other  heap  value:  that  is,  all  the  heap  values 
are  mutually  recursively  defined.  The  presence  of  a  heap  context  in  the  program  well-formedness 
judgement  leaves  open  the  possibility  of  compiling  programs  against  externally  defined  labels. 

Small  values 

Values  include  variables,  constants,  polymorphic  instantiation,  and  existential  introduction.  Sum 
tags  tagj  are  made  explicit,  and  a  coercion  is  introduced  to  inject  terms  into  the  union  type. 

64  bit  values 

The  only  64  bit  values  present  in  the  LIL  are  64  bit  float  constants  and  variables. 

32  bit  operations 

Operations  are  computations  that  return  values,  and  are  bound  to  variables  within  expressions. 
For  simplicity,  I  include  values  into  the  operations  to  unify  let  binding  into  a  single  mechanism. 
Other  operations  include  unrolling  of  recursive  types,  tuple  introduction  and  tuple  selection,  boxing, 
sum  case,  known  sum  projection,  exception  constructs,  and  boxing  of  64  bit  values.  Array  update 
and  creation  operations  are  provided  for  both  32  and  64  bit  arrays,  along  with  the  array  subscript 
operation  for  32  bit  arrays.  Note  that  all  memory  allocation  in  the  LIL  is  explicit  and  present  in  this 
level,  whether  through  the  tuple  introduction,  the  box  primitive  or  the  array  creation  primitives. 
The  only  exception  to  this  is  function  introduction  which  implicitly  allocates  a  closure:  this  is  dealt 
with  via  closure  conversion  which  turns  uses  of  functions  into  uses  of  tuples  and  code.  This  is 
described  in  detail  in  chapter  6  and  section  9.3. 
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64  bit  operations 


The  64  bit  operations  include  unboxing  of  64  bit  values  and  64  bit  array  subscripts,  as  well  as  the 
inclusion  of  64  bit  values. 


Expressions 

Expressions  in  the  LIL  are  either  small  values,  or  let  bindings  of  any  of  several  forms.  Recursive 
function  binding,  32  bit  variable  binding,  and  64  bit  variable  binding  are  provided.  Existential 
unpacking  is  also  included  at  this  level.  The  most  interesting  expressions  however,  are  the  type 
refinement  bindings  that  support  the  technology  upon  which  type  analysis  in  the  LIL  is  built. 

The  most  important  of  these  is  the  vcase,  or  “virtual  case”  construct.  Crary  and  Weirich 
[CW99]  describe  a  system  for  implementing  type  analysis  as  case  analysis  on  constructor-level 
sums,  essentially  defining  typecase  in  terms  of  more  standard  type  theoretic  constructs.  They  also 
describe  a  variant  of  their  system  which  allows  for  a  type  erasure  interpretation  but  still  supports 
type  analysis  as  a  programming  idiom.  In  this  variant,  term-level  sums  are  used  to  stand  for 
analyzable  constructors.  The  vcase  construct  provides  a  mechanism  whereby  information  about 
the  identity  of  types  encoded  as  terms  can  be  reflected  back  into  the  type  level. 

To  understand  how  this  works,  consider  a  value  v  of  type  case(a, /3i.c, /32.Void).  According  to 
the  type  given,  v  is  either  of  type  c  or  of  type  Void  depending  on  whether  a  gets  instantiated  with 
a  left  or  a  right  injection.  But  since  there  are  no  closed  values  of  type  Void,  it  is  apparent  that  a 
can  not  be  instantiated  with  a  right  injection,  since  to  do  so  would  imply  that  v  has  type  Void.  In 
essence,  v  serves  as  a  witness  to  the  fact  that  a  will  be  instantiated  with  the  left  injection. 

This  fact  can  be  propagated  back  into  the  type  system  by  using  vcase.  When  the  argument 
to  vcase  is  a  variable,  its  arms  are  type-checked  with  the  variable  replaced  with  the  appropriate 
injection  in  the  context  and  in  the  body  of  the  arm  (that  is,  in  the  left  arm,  the  variable  is 
replaced  with  the  left  injection,  and  similarly  with  the  right).  Therefore,  in  the  second  branch  of 
the  expression  vcase(Q;, /3i.e, /32- deadu)  the  value  v  has  type  Void,  which  implies  that  this  branch 
cannot  be  reached  and  is  dead  code.  Within  the  body  of  e,  a  is  known  to  be  the  left  injection,  and 
V  is  known  to  have  type  c.  When  types  are  erased,  it  is  sound  to  erase  the  vcase  as  well,  since  one 
arm  is  known  to  be  dead  code.  The  vcase  construct  is  the  key  to  implementing  type  analysis,  as 
the  subsequent  section  will  show. 

In  addition,  two  special  binding  forms  exist  for  refining  constructor  paths  into  variables  for 
analysis.  When  a  has  kind  ki  x  K2,  the  pair  refinement  operator  let(/3, 7)  =  aine  replaces  all 
occurrences  of  a  in  e  with  the  pair  (/3,7).  This  means  that  projections  from  a  can  be  turned 
into  variables.  The  vcase  construct  can  only  refine  the  type  of  its  argument  when  the  argument 
is  a  variable:  this  pair  refinement  construct  allows  this  to  be  extended  to  paths  as  well.  The 
let  (f  old /3)  =  a  ine  expression  serves  a  similar  purpose  when  a  has  a  recursive  kind. 


Heap  values 

The  only  heap  values  currently  supported  in  the  theoretical  treatment  of  the  LIL  are  code  functions, 
which  are  necessary  for  closure  conversion.  Each  heap  value  is  required  to  be  closed  with  respect 
to  variables,  but  may  refer  to  any  other  heap  value  via  its  label. 
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Heaps 


A  heap  is  a  collection  of  mutually  recursively  defined  heap  values.  A  heap  is  well-formed  in  a  heap 
context  if  each  of  its  constituent  heap  values  is  well-formed  at  the  type  at  which  its  label  is  bound. 
Note  that  I  require  that  all  of  the  heap  labels  be  present  in  the  heap  context:  for  non-recursive 
functions  this  is  not  strictly  necessary,  but  it  serves  the  additional  purpose  of  enforcing  the  property 
that  all  labels  in  the  heap  are  distinct  (via  well-formedness  check  on  heap  contexts). 

h  'k  ok 


'k  h  e  ok 


'k[.^:r]  h  hval :  r  hval 
'k[t':r]  h  d  ok 

h  d,i:T  ^  hval  ok 


Programs 

LIL  programs  consist  of  a  heap,  which  binds  labels  to  heap  values,  and  an  expression  which 
computes  the  “value”  of  the  program.  Since  the  heap  values  are  (potentially)  mutually  recursive, 
the  heap  is  checked  in  a  context  including  bindings  for  all  of  the  labels  in  the  heap.  For  notational 
purposes,  I  define  an  operation  on  heaps,  'k(d),  that  corresponds  to  the  heap  context  produced  by 
taking  the  label  and  type  from  each  binding  in  a  heap 


'k(e) 

'k(d,  t.T 


del 

=  • 

hval)  'k(d), t':r 


The  well-formedness  rule  for  program  simply  checks  the  heap  and  the  expression  portion  of  the 
program  in  the  heap  context  extended  with  the  bindings  from  the  heap. 


'k,'k(d)l-  ok'k, 'k(d)  h  d  ok 
'k,  'k(d);  •  h  e  :  r  exp 

'k  h  letrec  d  in  e  :  r 


3.2  Useful  properties  of  the  LIL 

Subsequent  proofs  rely  on  certain  properties  of  the  LIL:  in  particular,  that  well-typedness  is 
preserved  under  substitution  and  that  weakening  for  typing  contexts  is  admissible. 

Lemma  1  (LIL  Substitution) 

If  A,  a-.n'  \-  c:  K  and  A  h  c' :  k'  then  A  h  c[c'/q;]  :  k. 

Proof:  (By  induction  on  c) 

We  proceed  by  cases  on  the  last  rule  of  the  derivation. 

1.  If  c  is  a  constant  or  *,  then  c[c'/a]  =  c.  By  assumption,  A,a:Ki  h  c:  k,  so  by  strengthening, 
A\-  c:  K. 
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2.  A[a:«;']  a' :  k 


•  a'  =  a.  Then  c[c' /a]  =  ci.  By  assumption,  A  h  d  ■.  k!  and  A,a:K'  h  a:  k.  But  by 
inspection  of  the  typing  rules,  it  is  clear  that  the  derivation  must  end  in  a  use  of  the 
variable  rule,  so  k  =  k',  and  hence  Ah  d :  k 

•  a' a.  Then  the  result  follows  as  in  the  constant  case  above. 

3.  A[q!:k;']  h  X{(5:k/s).c:  Kfj  ^  Kr-  Then  by  assumption,  A,a:K' ,  P'.Kp  h  c:Kr.  By  induction, 
A,  /3:k/3  h  d\d ld\  :  Kt-  (Note  that  we  assume  that  f3  is  chosen  appropriately  to  avoid  capture.) 
Then  by  the  lambda  rule,  A  h  A(/3:/i^)(c[c'/a]) :  k,  and  hence  by  definition  of  substitution, 
A  h  {\{(3-.Kfd)d)[d / a]  :  k. 

4.  A[a:id]  h  ci  C2  :  k.  Then  by  assumption,  A,a:K'  h  ci:k2^  k  and  A,a:id  h  C2  : /t2-  By 

induction,  A  h  ci[c'/a]  :  K2  —>■  k  and  A  h  ci[c'/a]  :  H2-  The  result  follows  directly  then  from 

the  application  rule  and  the  definition  of  substitution:  A  h  (ci  C2)[cYa]  :  k. 

5.  A[a:id]  h  (ci,C2):ki  x  K2-  Then  by  assumption  A^a-.n'  h  ci :  ki  and  A,a:d  h  C2'-K2-  By 

induction,  A  h  ci[cYa]  ■  ki  and  A  h  C2[c'/q;]  :  K2-  The  result  follows  directly  then  from  the 

pair  rule  and  the  definition  of  substitution:  A  h  (ci,C2)[cYa]  :  k. 

6.  A[q;:k:']  h  TTiCp  :  Ki.  Then  by  assumption  A,  a:d  h  Cp:  k  x  K2-  By  induction,  A  h  Cp[d /a]  :  k  x 
K2-  The  result  follows  from  the  the  projection  rule  and  the  definition  of  substitution. 

7.  A[a:id]  h  7r2Cp  :  K2-  As  for  7r2. 

8.  A[a:K']  h  inj^c:  +  [ki,  . . . ,  Kn]-  Then  by  assumption  A,  am'  h  c:Ki.  By  induction,  A  h 
d\d Iq\  :  Kj.  Finally,  by  the  injection  rule  and  the  definition  of  substitution: 

A,  am  h  (injjc)[c7a]  :  +  [ki,  . . . ,  Kn] 

9.  A[q;:«:']  h  case  c[(ai.ci, . . . ,  an-Cn)]  ■  k.  By  assumption: 

•  A,  am'  he:  +  [ki,  . . . ,  Kn] 

•  A,  a-m' ,  af-Ki  h  Ci:  k 

By  induction: 

•  Ah  c[c7a]  :  +  7i,  •  •  • ,  ^n] 

•  A[ai:Ki]  h  Ci[c7a']  :  k 

(Note  that  the  ai  may  always  be  chosen  to  avoid  capture).  Then  by  the  case  rule  and  the 
definition  of  substitution,  A  h  (case  c[(ai.ci, . . . ,  an-Cn)])[d / a]  :  k 

10.  A[a-m']  h  f  old^j.K  c :  iijm.  Then  by  assumption,  A[a-m']  h  c :  and  A[a-m']  h  ^jm  ok. 

By  induction,  A[a:K']  h  c[c7a]  :  and  so  by  the  fold  rule  and  the  definition  of 

substitution,  A[a-m']  h  (f  old^y^  c)77®]  • 
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11.  A[a:«;']  h  pr(j, /3:ki, /3:(j  ^  K2),  inc)  ^  K2  Then  by  assumption, 

A[a:K]  h  pr{j,P:Ki,p:{j  K2),  inc) :  ^  K2 

By  induction 

A,j,P:Ki,p:{j  K2)  b  c[c7a]  :  ^2 

Note  that  we  can  choose  variables  appropriately  to  avoid  capture,  and  hence  by  the  primitive 
recursion  rule  and  the  definition  of  substitution 

A[a-.K]  h  pr(j,/3:Ki,p:(j  ^  K2),  inc)[c7a]  :  ^  ^2 

12.  A[q;:«;']  h  c[k]  :K2[^/i]- 

Then  by  assumption: 

A[a-.K']  h  c :  Vj.K2 

By  induction: 

A  h  c[c'a\  :Vj.K2 

By  the  kind  application  rule  and  the  definition  of  substitution: 

A  h  c[k][c! /a]  :  K2[K/j] 

13.  A[q;:k:']  h  Aj.c'.yj.K.  Then  by  assumption,  A[a:K'],j  h  c:k.  By  induction  A,j  h  c[c7a]  ■  k-- 
Hence  by  the  kind  abstraction  rule  and  the  definition  of  substitution  A  h  {h.j.c)[c' /a\  :  k. 


A  few  structural  properties  of  LIL  contexts  are  important  as  well.  I  state  the  weakening  property 
and  give  a  sketch  of  the  proof.  I  remain  informal  about  re-ordering  properties  of  typing  contexts 
throughout,  but  in  the  absence  of  dependent  types  or  kinds  it  is  straightforward  (though  tedious) 
to  formalize  them.  I  will  also  occasionally  informally  use  simultaneous  weakening  to  combine  whole 
contexts:  e.g.  A,  A'  where  the  intersection  of  the  domains  is  empty.  It  should  be  clear  that  this 
form  of  weakening  can  also  be  formalized  by  an  induction  on  the  second  context,  appealing  to  the 
unary  forms  of  the  weakening  lemmas  to  incrementally  construct  the  new  context. 

Lemma  2  (LIL  Weakening) 

1.  If  A  h  K  ok  and  j  ^  A,  then  A,j\-K  ok. 

2.  If  A\-  K  ok  and  A\-  k'  ok  and  a  ^  A,  then  A,  w.k'  h  n  ok. 

3.  If  A  \-  c:  K  and  A  h  k'  ok  and  a  ^  A  then  A,  ain'  h  c:  k. 

4.  If  A\-  c  =  d  :  K  and  Ah  k'  ok  and  a  ^  A  then  A,  a'.n'  h  c  =  c' n. 

Proof  (Sketch):  Each  of  the  proofs  proceeds  similarly  by  induction  on  the  structure  of  typing 
derivations.  For  each  typing  rule,  I  inductively  construct  new  sub-derivations  for  each  premise  and 
use  the  original  rule  to  construct  the  new  derivation.  The  side-condition  on  binding  sites  does  not 
follow  immediately:  it  is  necessary  to  observe  that  it  is  always  possible  to  use  alpha-variance  to 
choose  an  appropriate  bound  variable  different  from  a.  For  premises  of  the  form  h  A  ok,  note  that 
the  derivation  of  h  A,a:K'  ok  follows  directly  from  the  assumptions  and  the  definition  of  context 
well-formedness . 

■ 

I  also  state  an  inversion  property  of  LIL  operations. 
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Lemma  3  (LIL  inversion) 

1.  If  D  is  a  derivation  of A;  F  h  opr  :  r  oprgj,  then  D  has  a  unique  last  rule  for  any  choice  of 
opr. 

2.  If  D  is  a  derivation  of  'F;  A;  F  h  fopr  :  cf)  oprg^,  then  D  has  a  unique  last  rule  for  any  choice 
of  opr. 

Proof:  By  inspection.  No  two  operation  rules  apply  to  the  same  syntactic  construct.  ■ 
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Chapter  4 


Example:  Floating  point  array 
flattening 

4.1  An  optimized  array  strategy 

The  use  of  the  LIL  type  refinement  operations  discussed  in  the  previous  chapter  is  somewhat 
non-intuitive.  Before  attempting  to  give  a  full  account  of  the  translation  of  the  MIL,  it  is  useful 
to  consider  a  small  example  demonstrating  the  type- analysis  methodology  used  in  the  LIL.  The 
optimization  I  will  describe  here,  floating  point  array  flattening,  is  one  of  several  implemented  by 
the  TILT  compiler.  In  this  scheme,  an  array  of  boxed  floating  point  numbers  is  implemented  not 
as  an  array  of  pointers,  but  as  an  array  of  the  actual  64-bit  values.  Polymorphic  array  operations 
dispatch  on  the  type  of  the  array  contents  to  determine  which  primitives  to  use.  In  an  informal 
\f^  style  notation,  this  corresponds  to  defining  the  following  type: 

ArrayQpda)  =  Typecase(a)  of  Boxed(i;A)  ^  Arrayg4((/)) 

I  _  ^  Array32(Q;) 

As  an  example,  I  will  show  how  to  write  a  term  level  array  constructor  arraygp^  that  can  be  used 
to  construct  such  arrays. 

4.2  LIL  implementation  of  optimized  arrays 

Type  analysis  in  the  LIL  is  based  on  the  idea  of  first  encoding  types  as  abstract  syntax  trees,  and 
then  writing  functions  that  use  the  encodings  to  reconstruct  the  actual  types  or  to  choose  different 
branches  of  code.  In  fact,  it  is  not  necessary  to  encode  entire  types:  we  may  choose  an  encoding 
that  captures  only  the  information  that  is  useful  for  our  purposes. 

More  concretely,  we  first  choose  an  encoding  strategy  that  captures  the  features  of  interest 
about  a  type.  For  every  type  of  interest  (that  is,  every  type  which  is  to  be  analyzed)  we  construct 
two  object  level  items:  a  constructor  which  serves  as  a  static  encoding  of  the  type  (SE),  and  a  term 
which  serves  as  a  dynamic  encoding  of  the  type  (DE).  Static  encodings  of  types  are  used  during 
typechecking  to  reconstruct  the  encoded  type  and  to  connect  the  type  to  its  dynamic  encoding. 
Dynamic  encodings  are  used  to  perform  the  actual  runtime  dispatch.  For  each  of  these  two  encod¬ 
ings,  there  must  also  be  a  corresponding  classifier  describing  it:  the  static  encoding  of  a  type  is 
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classified  by  the  static  encoding  kind  (SEK),  and  the  dynamic  encoding  of  a  type  is  classified  by 
its  dynamic  encoding  type  (DET). 

4.2.1  Static  encodings 

At  the  kind  level,  sums  and  recursive  kinds  are  used  to  build  a  type’s  static  encoding.  For  the 
purposes  of  this  example  optimization,  the  encoding  can  be  very  simple:  we  need  only  to  be  able 
to  distinguish  between  boxed  64-bit  types  and  other  types.  This  is  reflected  in  the  static  encoding 
kind  for  our  example: 


Topt  =  T64  +  T32  (*  Boxed  of  (<^)  |  Other  of  (r)*) 

The  Tqp^  kind  classifies  static  encodings.  The  comment  is  meant  to  suggest  the  intuitive  corre¬ 
spondence  between  this  definition  and  an  ML-style  representation  of  an  abstract  syntax  tree. 

Constructors  of  kind  Tgp^  encode  types.  In  order  to  take  advantage  of  these  encodings,  we  re¬ 
place  all  constructors  by  their  encodings.  For  example,  polymorphic  functions  must  expect  encoded 
types  as  their  type  arguments.  In  order  to  be  able  to  typecheck  applications  of  these  functions,  we 
must  be  able  to  reconstruct  the  original  type  from  its  encoding.  This  is  done  by  writing  an  object 
level  function  to  interpret  static  encodings  into  the  types  they  encode.  For  the  example  at  hand, 
this  function  simply  boxes  up  the  64  bit  types  and  leaves  the  32  bit  types  unchanged.  Hence: 

interp  :  Tgp^  ^  T32  A(a:TQp^).  case  a  of  inj^  f3  ^  Boxed(/l) 

I  inj2/3  ^  /? 

The  use  of  this  interpretation  function  can  be  seen  by  considering  the  static  encodings  of  boxed 
64-bit  floats  and  32-bit  ints: 

T 

Cbf  :  Tgpt  Float 

rt  qn  def  .  ."•"opt  ^  . 

w  :  Topt  =  inj2  ^  Int 

Notice  that  the  interpretation  function  maps  each  of  these  back  to  the  original  type.  This  allows 
us  to  reconstruct  the  original  type  from  the  encoding.  An  important  point  to  note  here  is  that 
this  scheme  does  not  guarantee  that  all  boxed  floating  point  types  will  be  represented  as  a  first 
injection:  it  is  perfectly  possible  to  stuff  Boxed(Float)  into  the  second  arm  of  the  sum.  This  is  not 
a  problem,  since  it  merely  causes  arrays  of  such  types  to  be  represented  in  un-flattened  form.  In 
fact,  one  could  look  at  this  as  a  feature,  since  the  optimization  does  not  force  you  to  choose  one 
representation  or  the  other. 

Of  course,  simply  writing  the  interpretation  is  fairly  uninteresting  by  itself  since  it  merely  gets 
us  back  to  where  we  were  before  we  encoded  types.  The  real  benefit  of  encoding  types  in  this  fashion 
comes  from  other  useful  functions  that  we  can  write  using  encodings  of  types.  In  particular,  the 
actual  type  of  specialized  arrays  that  was  given  informally  above  can  now  be  written  down  directly. 

def 

Arrayopt:Topt  ^  T32  =  A(Q;:Topt).  case  a  of  inj^  f5  Arrayg4(/3) 

I  inj2  /3  ^  Array32(/3) 

The  ArrayQp^  constructor  is  a  function  that  maps  encoded  types  to  optimized  array  types.  It  takes 
advantage  of  the  additional  information  present  in  the  encoded  types  to  represent  arrays  of  boxed 
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floats  more  efficiently.  When  the  encodings  are  statically  known,  the  optimized  array  type  can  be 
reduced  directly  to  an  underlying  primitive  array  type. 

ArrayQp^(C'BF)  =  Arrayg4(Float) 

Arrayopt(C'/)  =  Array32(lnt) 

When  the  actual  encoding  is  abstract  (for  example  in  a  polymorphic  function)  the  composite  array 
constructor  can  not  be  reduced  definitively  to  a  primitive  array  type,  since  it  is  not  statically  known 
which  sort  of  array  can  be  expected. 

4.2.2  Dynamic  encodings 

Static  encodings  account  for  type  analysis  at  the  type  level  by  representing  types  as  constructor 
level  abstract  syntax  trees  and  by  writing  functions  which  analyze  the  structure  of  encodings  to 
produce  optimized  results.  Type  analysis  at  the  term  level  is  accounted  for  in  an  analogous  fashion 
using  dynamic  encodings.  In  addition  to  encoding  every  type  at  the  constructor  level,  we  also  give 
a  dynamic  encoding  of  every  type  at  the  term  level.  The  DE  of  a  type  can  then  be  used  to  dispatch 
at  runtime.  A  key  point  of  this  methodology  is  that  the  dynamic  and  static  encodings  of  a  type  are 
not  independent:  they  are  related  via  the  dynamic  encoding  type.  This  allows  information  gained 
via  tests  on  the  dynamic  encoding  to  be  reflect  back  onto  the  static  encoding. 

In  fact,  the  DET  of  a  type  is  simply  another  object  level  type  function  operating  on  the  SE 
of  the  type  in  the  same  fashion  as  the  interpretation  function  and  the  optimized  array  functions. 
For  the  example  at  hand,  the  type  of  the  dynamic  encoding  of  a  type  whose  static  encoding  is  c  is 
given  by  R{c),  where  R  is  defined  as  follows: 

^  ■  ^opt  ^  T32  A(a:TQp^).(casea  of  inj^/?  ^  Unit  |  inj2/3  ^  Void) 

+(caseaof  inj^/S^Voidj  inj2/3^Unit) 

This  definition  is  a  bit  subtle,  and  is  worth  examining  in  detail.  The  principle  high-level  point  is 
that  dynamic  encodings  of  types  will  be  values  of  sum  type,  which  can  be  dispatched  on  using  the 
term  level  case  construct.  The  subtlety  arises  in  the  types  given  for  the  two  arms  of  the  sum  type 
here.  Metaphorically  speaking,  each  arm  of  the  sum  here  can  be  thought  of  as  serving  as  a  “proof” 
of  the  identity  of  a.  The  reasoning  behind  this  is  to  observe  that  there  are  no  closed  values  of  type 
Void.  This  tells  us  that  any  closed  value  of  type  case  c  of  inj^  f3  Unit  |  inj2  /3  ^  Void  must  in 
fact  be  unit.  But  this  in  turn  tells  us  (informally)  that  c  can  only  be  of  the  form  inj^  r. 

So  the  informal  reasoning  of  the  previous  paragraph  tells  us  that  after  dispatching  on  the 
dynamic  encoding  of  a  type,  we  can  informally  infer  information  about  its  static  encoding.  In 
particular,  if  the  dynamic  encoding  is  a  left  injection,  then  the  static  encoding  must  similarly  be  a 
left  injection;  and  similarly  for  right  injections.  In  fact,  as  we  shall  see,  the  vcase  constructs  gives 
us  a  formal  method  for  using  this  information. 

Dynamic  encodings  of  types  are  terms  whose  types  are  given  by  the  operation  of  R  on  their 
static  encoding.  So  for  example,  the  dynamic  encodings  of  boxed  64-bit  floats  and  of  32-bit  ints 
are  given  as  follows: 

Vbf  -RiCBp)  inji* 

Vr  :R{Ci)  inj2* 
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4.2.3  Type  analyzing  terms 


To  re-cap,  static  encodings  of  types  (such  as  Cbf)  provide  type  level  representations  of  the  abstract 
syntax  of  the  analyzable  types.  Dynamic  encodings  of  types  (such  as  Vbf)  provide  term  level 
representations  of  the  abstract  syntax  of  the  analyzable  types,  and  are  classified  by  types  which 
depend  on  the  static  encoding  of  the  type. 

In  this  framework,  functions  which  analyze  types  (given  in  the  MIL  by  special  primitives)  simply 
become  functions  which  take  as  arguments  the  static  and  dynamic  encodings  of  the  analyzable  types 
and  dispatch  on  them  appropriately.  The  type  of  the  function  to  construct  flattened  arrays  reflects 
this: 

arrayopt  :  V(a:Topt).ii(a)  ^  Int  ^  interp(a)  ^  Arrayopt(a) 

The  type  argument  and  the  first  term  argument  correspond  to  the  static  and  dynamic  encodings 
of  the  type  of  the  contents  of  the  array.  The  type  of  the  initial  value  for  the  array  is  obtained  by 
interpreting  the  encoded  type  using  the  interp  function,  and  the  return  type  is  given  by  interpreting 
the  encoded  type  using  the  Arraygp^  function. 

The  actual  implementation  of  the  array  creation  function  simply  case  analyzes  the  dynamic 
encoding  to  determine  the  appropriate  primitive  array  operation  to  use.  The  code  itself  is  straight¬ 
forward:  the  subtlety  lies  in  understanding  why  the  code  is  well-typed. 

def 

arrayopt  =  A(a:Topt).A(xQ,:i?(a)).A(i:Int).A(y:  interp(a)). 
case  Xa 

of  inj;^  r  ^  vcase  a  of  inj^/3i  arrayg4[/3i](i,  unbox(y)) 

I  inj2  j32  deadr 
I  in j  2'i"  ^  vcase  a  of  in j  ^  dead  r 

I  inj2/?2  ^  array32[/32](i,y) 

As  expected,  the  type  argument  a  has  the  kind  of  encoded  types,  Tgp^.  Similarly,  the  term 
argument  Xa  has  the  type  of  dynamic  encodings  of  a,  and  will  be  instantiated  with  the  dynamic 
encoding  of  ol.  The  actual  type  of  the  contents  of  the  array  can  only  be  referred  to  via  the  interp 
function,  since  it  is  unknown  at  compile  time.  Therefore,  the  y  argument  with  which  the  array  is 
to  be  initialized  is  given  type  interp(Q;).  Similarly,  the  return  type  ArrayQp^(a)  gives  the  type  of 
the  returned  array  as  a  function  of  what  a  turns  out  to  be. 

The  key  to  understanding  how  this  all  works  out  is  to  observe  the  effect  that  the  vcase  constructs 
have  on  the  types  of  the  variables  in  the  function.  Consider  just  the  first  branch  of  the  case  analysis 
on  Xa,  where  the  body  of  the  arm  does  a  virtual  case  analysis  of  a.  According  to  the  typing  rule 
for  vcase,  the  second  arm  of  the  vcase  will  be  type-checked  with  11122^2  substituted  everywhere 
for  a,  including  in  the  context.  This  means  that  whereas  outside  the  arm  the  variable  r  has  type 

case  a  of  in  j  4/?  ^  Unit  |  inj  2/?  ^  Void 


within  the  arm  it  has  type 


case  inj  2/^2  of  inj^/S  ^  Unit  |  inj  2/?  ^  Void 

which  is  equivalent  to  simply  Void.  This  satisfies  the  typing  rule  for  vcase,  which  requires  the 
dead  branch  to  exhibit  a  value  of  type  Void  as  proof  that  the  branch  is  in  fact  dead. 
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To  illustrate  this  further,  the  following  table  shows  the  types  for  the  variables  r  and  y  inside 
and  outside  of  the  first  vcase,  at  the  occurrences  indicated  in  bold  in  the  definition  above. 


Outside  of  vcase 

Inside  of  vcase 

y:  case  a  of  inj^ /3  ^  Boxed(/3) 

1  inj2/3  ^  P 

y:Boxed(/3i) 

r:  case  a  of  ^  Unit  inj2/3  ^  Void 

r:Void 

As  an  exercise,  the  reader  may  verify  that  when  called  with  the  appropriate  arguments,  the 
optimized  array  function  defined  above  does  in  fact  reduce  to  the  appropriate  32  or  64-bit  array 
primitive  based  on  the  representation  of  the  type  chosen. 

arrayopt[C'BF](TBF)(10)(box(0.0))  arrayg4[Float](10, 0.0) 

arrayopt[C/](F/)(10)(0)  array32[lnt](10, 0) 
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Chapter  5 

The  MIL  to  LIL  translation 


The  translation  from  the  MIL  into  the  LIL  language  is  primarily  interesting  in  that  it  makes  the 
uses  of  type  analysis  in  the  MIL  primitives  explicit  and  adds  representations  for  types  so  that  this 
analysis  can  be  done  at  the  term  level.  This  implements  in  a  typed  fashion  what  is  done  currently  in 
an  untyped  setting.  The  type  analysis  methodology  for  this  is  essentially  the  same  as  that  described 
in  the  previous  chapter,  but  extended  to  handle  additional  type  analyzing  operations. 

The  first  section  of  this  chapter  gives  a  high-level  overview  of  the  translation,  introducing  the 
relations  and  stating  some  of  the  major  typing  properties.  These  translations  are  developed  in 
detail  in  successive  sections,  along  with  proofs  of  their  soundness. 


5.1  Translation  overview 

The  translation  of  MIL  programs  in  to  LIL  programs  is  defined  by  several  inductively  defined 
relations  between  elements  of  the  MIL  syntactic  classes  and  their  correspondents  in  the  LIL. 
These  relations  may  be  broadly  grouped  into  four  classes:  those  concerned  with  the  static  encoding 
of  constructors,  those  concerned  with  the  dynamic  encoding  of  constructors,  those  concerned  with 
proper  MIL  types,  and  those  concerned  with  the  expression  translation  itself. 

The  first  group  consists  of  the  kind  and  constructor  translations,  and  the  corresponding  trans¬ 
lation  on  constructor  typing  contexts. 


Static  encoding  translations 

SEK 

K 

•  h  K  ok 

SE  context 

|A| 

h  A  ok 

SE 

c 

A  h  c  :  K 

The  kind  translation  replaces  MIL  kinds  (closed  by  definition)  with  closed  LIL  kinds  classifying  the 
translation  of  MIL  constructors  of  the  original  kind.  The  static  encoding  translation  for  contexts 
(|A|)  simply  applies  the  kind  translation  across  the  range  of  a  MIL  typing  context.  This  allows 
the  statement  of  the  desired  typing  property  of  the  constructor  translation:  that  if  A  h  c :  k,  then 
|A|  h  |c|  :  |k|  (theorem  2).  (There  are  of  course  a  number  of  other  auxiliary  typing  properties  to  be 
shown:  these  are  covered  in  more  detail  in  subsequent  sections.) 

The  second  group,  primarily  concerned  with  the  dynamic  encoding  of  constructors,  consists 
of  additional  translations  on  constructors  and  typing  contexts  and  a  relation  on  constructors  and 
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kinds. 


Dynamic  encoding  translations 

DET 

\c:Ki 

|A|  HJcikL  :T32 

DE  context 

JAL 

|A|  hJAL  ok 

DE 

Jcl 

•;  |A|;  JAthJct  :  \c\k[  exp 

The  first  translation,  \c'.k[,  is  a  relation  between  MIL  constructor /kind  pairs  and  LIL  types.  It 
defines  the  dynamic  encoding  type  for  a  constructor  c  of  kind  k.  The  inhabitants  of  this  type  are 
the  dynamic  encodings  of  constructors,  and  are  produced  by  a  relation  between  MIL  constructors 
and  LIL  expressions:  Jcj,.  The  dynamic  encoding  translation  for  contexts  maps  LIL  constructor 
contexts  to  MIL  term  contexts.  The  domain  of  the  new  context  is  constructed  using  an  injection 
from  type  variables  to  term  variables,  while  the  range  is  constructed  using  the  static  encoding  type 
translation.  It  should  be  the  case  that  if  A  h  c:  k,  then  dt;  |A|;  JA^hJcl.  :  \c:kI  exp  (theorem  3) 
The  third  group  is  concerned  with  the  translation  of  proper  MIL  types  (as  opposed  to  con¬ 
structors)  . 


Type  translations 

Types 

r 

|A|  h  |r|:T32 

Term  context 

|r| 

|A|  h  |r|  ok 

The  type  translation  is  a  relation  between  proper  MIL  types  and  LIL  constructors  of  kind  T32,  and 
the  term  context  translation  simply  maps  this  translation  across  the  range  of  MIL  term  contexts. 
It  should  be  the  case  that  if  A  h  r  ok,  then  |A|  h  |r|  :  T32. 

The  final  group  of  translations  relates  MIL  terms  of  various  syntactic  classes  to  their  corre¬ 
sponding  LIL  terms. 


Term  translations 

Small  values 

A;  F  h  St! :  r 

vkHAlUAUrlhs^OIrl 

Float  values 

A;  F  h  /?; :  Float  ^  fv' 

vk;|A|;JAUr|hV:Float 

Operations 

A;  F  h  opr  :  r  opr' 

|A|;JA1„  |F|  h  opr':\T\  oprg^ 

64  bit  Operations 

A;  F  h  fopr  :  Float  fopr' 

|A|;JAl„  |F|  h /opr':  Float  oprg^ 

Expression 

A;T  \-  e:T  e' 

|A|;JAl,,  |F|  h  e':  |r|  exp 

It  is  convenient  to  phrase  these  as  typed  translations  since  the  additional  type  information  is  some¬ 
times  required  in  order  to  construct  the  appropriate  LIL  syntax.  The  intended  typing  properties 
of  these  translations  should  be  clear.  For  example,  it  should  be  the  case  that  if  A;  T  h  e :  r  and 
A;  r  h  e  :  T  ^  then  'k;  |A|;  (Aj.,  |r|  h  e' :  |t|  exp  (for  appropriate  heap  contexts  dt,  theorem  8). 

5.2  Static  encodings  of  constructors 


A  static  encoding  is  a  LIL  constructor  which  encodes  information  about  the  MIL  constructor 
which  it  represents.  The  static  encoding  can  be  analyzed  at  the  type  level  to  determine  what  type 
it  represents.  It  can  also  be  translated  by  an  object  level  interpretation  function  to  determine  the 
actual  LIL  type  corresponding  to  the  MIL  type  that  it  represents.  This  section  develops  these 
mechanisms,  beginning  with  the  translation  of  kinds. 
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5.2.1  The  kind  translation 

The  choice  of  what  information  to  capture  in  the  encoding  depends  entirely  on  the  kind  of  type 
analysis  optimization  that  is  desired.  In  the  MIL,  there  are  two  major  uses  of  type  analysis: 
an  array  flattening  optimization  and  the  vararg  optimization.  The  array  optimization  is  almost 
exactly  as  described  in  chapter  4,  and  requires  the  encoding  strategy  to  distinguish  between  boxed 
floats  and  other  types.  The  vararg  optimization  is  described  in  chapter  2,  and  requires  the  ability 
to  distinguish  between  records  of  various  widths  and  other  types  so  that  the  vararg  and  onearg 
operations  can  choose  an  appropriate  calling  convention.  For  the  purposes  of  this  translation  then, 
it  is  sufficient  for  the  encoding  to  capture  three  classes  of  types:  records  (with  their  widths),  boxed 
floating  point  numbers,  and  all  other  types.  In  fact,  the  actual  implementation  sub-divides  the  last 
category  into  pointer  types  and  non-pointer  types  in  order  to  implement  a  further  optimization  on 
sums  that  is  not  discussed  here,  as  it  adds  nothing  substantial  to  the  discussion. 

This  division  of  types  into  categories  is  apparent  in  the  static  encoding  kind  that  we  choose  for 
the  translation: 

Tmii  T32list  -|-  1  -|-  T32 

Intuitively,  Tmii  corresponds  to  an  ML  datatype  as  such: 

datatype  Tmii=  Record  of  T32list|  BFloat  |  Other  of  T32 

For  presentational  purposes,  I  will  often  use  ML  pattern  matching  style  notation  using  this  intuitive 
correspondence. 

Recall  that  the  SEK  is  the  kind  that  classifies  static  encodings.  What  this  definition  tells  us 
is  that  encoded  constructors  are  either  a  list  of  types  corresponding  to  the  types  of  the  fields  of  a 
record,  a  boxed  floating  point  number,  or  some  other  unknown  type.  Note  that  since  the  only  64 
bit  type  in  the  MIL  is  Float,  it  is  not  necessary  to  include  any  information  in  the  second  arm  of 
the  sum.  If  we  wished  to  allow  arbitrary  64  bit  types  to  be  flattened  into  arrays,  we  could  replace 
the  1  in  the  second  arm  of  the  sum  with  a  Te4. 

The  previous  definition  tells  us  what  the  kind  of  encodings  of  proper  MIL  types  looks  like. 
Arbitrary  MIL  kinds  are  translated  into  LIL  kinds  simply  by  replacing  all  occurrences  of  T32  with 
Tmii,  and  leaving  the  rest  of  the  structure  intact. 

IT32I  T^ii 

I  I  II  II 

^  1^2]  =  |/^l|  ^  \l^2\ 

I  I  I  I  I  I 

|AC1  X  AC2I  =  l^ll  X  |aC2| 


Following  the  methodology  described  in  chapter  4,  the  first  important  object  level  type  function 
we  must  define  to  assist  with  the  encoding  is  an  interpretation  function  which  captures  the  meaning 
of  the  encoding  in  terms  of  the  underlying  types.  This  is  done  by  defining  an  interpretation  function 
interp  of  kind  Tmii  ^  T32. 


interp:Tmii  ^  T32  =  A(a:Tmii). 

case  a 

Record  I  ^  x{l) 

I  BFloat  ^  Boxed(Float) 
I  Other  t  ^  t 
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For  clarity,  I  use  an  ML  style  pattern  matching  notation  based  on  the  informal  datatype  definition 
given  above.  It  should  be  completely  clear  how  to  translate  this  to  formal  syntax  simply  by  replacing 
uses  of  Record  for  example,  with  inj^  1. 

Notice  that  this  function  is  an  object  level  function  as  opposed  to  a  meta-level  translation. 
This  is  necessary,  since  type  abstraction  means  that  the  interpretation  cannot  always  be  statically 
computed. 


5.2.2  The  constructor  translation 

The  actual  constructor  translation  |c|  translates  MIL  constructors  of  kind  n  to  LIL  constructors 
of  kind  |k|.  This  means  that  constructors  of  kind  T32  will  be  mapped  to  LIL  constructors  of  kind 
Tmii.  For  clarity  in  the  translation,  I  begin  by  defining  some  LIL  functions  that  serve  to  construct 
static  encodings  of  types.  These  serve  as  “constructors”  for  the  ML  style  notation  suggested  by 
the  datatype  given  above. 

Record  :T32list  ^  Tmii  A(a:T32list).  inj]"""'  a 
BFloat  :Tj„ii  inj  *) 

Other  :T32^Tmii  A(a:T32).  inj  J”“(inj2  a) 

Note  that  for  syntactic  clarity  I  frequently  leave  off  the  kind  decoration  on  sum  injections  where  it 
is  obvious  from  context.  Frequently,  throughout  this  dissertation  I  will  use  boldface  for  names  of 
defined  forms  such  as  these  to  distinguish  them  from  primitive  syntax. 

The  actual  static  encoding  translation  proceeds  for  the  most  part  compositionally  over  the  struc¬ 
ture  of  constructors.  Constructors  of  higher  kind  are  translated  directly  by  recursively  translating 
their  sub-components.  All  other  constructors  are  encoded  into  the  T^ii  kind. 

It  is  convenient  to  define  object  level  functions  for  use  in  the  translation  and  elsewhere  that 
correspond  exactly  to  the  type  analyzing  primitives  from  the  MIL:  Array^  and  Vararg^^^^^.  This 
should  again  be  familiar  from  the  methodology  developed  in  chapter  4. 

Because  of  the  special  treatment  of  arrays  of  boxed  floating  point  numbers,  the  Array  type 
needs  to  analyze  the  type  of  values  the  array  contains.  This  is  implemented  by  translating  the 
Array  type  into  a  sum  switch  on  the  type.  If  the  type  being  encoded  turns  out  at  run  time  to  be  a 
boxed  floating  point  number,  then  the  case  statement  will  return  the  64  bit  array  type:  otherwise, 
it  returns  the  32  bit  version.  Note  that  in  the  record  case,  it  must  explicitly  reconstruct  the  record 
using  the  constituent  field  types. 

Hpf 

Array  :Tmii  ^  T32  =  A(a:Tmii).case  a 

Record  (3  ^  Array32(x(/3)) 

I  BFloat  ^  Arrayg4(Float) 

I  Other  t  ^  Array32t 

Vararg  types  similarly  become  dispatches  over  the  argument  types  to  select  the  appropriate 
function  type:  either  a  flattened  function  type  for  small  records;  otherwise  a  standard  record. 
Throughout  this  translation,  we  assume  that  only  records  with  fewer  than  3  fields  get  flattened  into 
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a 

def 

a 

A(a:K;).c 

1  1 

def 

def 

A(q;:  ft ).  c 

|ClC2| 

def 

|ci||c2| 

TTi  C 

def 

TTi  C 

vr2  c 

— 

^2  C 

Kcl,c2)| 

def 

(|cl|,|c2|) 

Int 

def 

Other(lnt) 

1  Sumj(ci, .  .  .,Cn)\ 

def 

Other(V[Tag(0), 

. . . ,  Tag(i 

-  1),  x[Tag(i),interp  Cj  ], . . . ,  x [Tag(n), interp 

Sum](ci, .  .  .  ,Cn)\ 

def 

Other (Tag(/))  j  <  i 

1  Sum^(ci, 

Cn)|  = 

Other ( X  [Tag(/),  interp  Cj  ])  i  <  j  <n 

Exn 

def 

Other(3(a::T32).(a  x  (Dyntaga))) 

1  Dyntag^  | 

def 

Other (Dyntag(interp  c  )) 

Farr ay 

def 

Other(ArrayQ4Float) 

Boxf 

def 

BFloat 

Unit 

def 

Record)] 

def 

Record[interp  c  ] 

|ci  X  C2I 

def 

Record  [interp  ci  ,  interp  C2  ] 

Cl  X  ...  X  Cn\ 

def 

Record  ([interp  ci  , . . . ,  interp  ]) 

1  (ci ,  .  .  .  ,  Cn)  >  c| 

def 

Other  ((interp  ci  , . . . ,  interp  Cn  )  ^  interp  c  ) 

Array^ 

def 

Other(Array(  c  )) 

|Vararg,^^^J 

def 

Other(Vararg(  Cl  ) (interp  C2  )) 

|//(q!,/3).(ci,C2)| 

def 

(Other(Rec[l  +  l](/)(inji  *)),  Other (Rec[l  +  l](/)(inj2  *)) 
where/  =  A(/>:1  +  1  ^  T32).A(a;.l  +  1). 


casecj 

inj^  _  ^  interp(|ci I [Other(/)(iiij^  *))/«,  Other(p(iiij2  *))//3]) 
I  inj2  -  ^  interp(|c2|[Other(/)(iiij^  *))/a,  Other(p(iiij2  *))//3]) 

Figure  5.1:  The  constructor  translation  (static  encoding) 
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registers,  but  in  general  the  number  of  fields  is  a  machine  dependent  parameter  of  the  translation. 

Hpf 

Vararg  :Tmii  ^  T32  ^  T32  =  A(a:Tmii).A(/3:T32).case  a 

Record  I  ^ 

(case  I 

I  [t]  ^  {t)^P 

I  BFloat  (Boxed  Float)  ^  /3) 

I  Other  t  ^  (t)  ^  P 

Note  that  the  Vararg  function  as  defined  above  is  asymmetric  in  its  two  arguments:  it  expects 
the  static  encoding  of  the  argument  type,  but  for  the  return  type  expects  the  type  itself.  This 
reflects  the  fact  that  the  result  depends  only  on  the  argument  type  and  not  the  result  type.  An 
alternative  (and  equally  valid)  definition  for  the  Vararg  primitive  can  be  given  that  expects  static 
encodings  for  both  arguments  (since  after  all,  the  argument  passed  to  the  Vararg  function  will 
almost  always  be  the  interpretation  of  a  static  encoding). 

Hpf 

Vararg  :Tmii  ^  Tmu  ^  T32  =  A(a:Tmii).A(/3:Tmii).case  a 

Record  I  ^ 

(case  I 

0^0^  interp(/3) 

\[t]  ^  (t)  ^  interp(/3) 

\[ti,t2]  ^interp(/3) 

|_  ^  interp(/3)) 

I  BFloat  ^  (Boxed Float)  ^  interp(/3)) 

I  Other  t  ^  (t)  ^  interp(/3)) 

From  a  semantic  standpoint,  the  two  definitions  are  equivalent.  In  practice  however,  using  the 
previous  definition  is  likely  to  provide  noticeable  performance  benefits  in  type-checking  since  it 
provides  a  definition  site  for  the  interpretation  of  the  result  type.  In  the  absence  of  this,  a  graph 
reduction  implementation  is  required  to  avoid  repeated  reductions.  While  this  is  certainly  possible 
(and  may  be  desired  in  any  case),  careful  design  of  the  defined  forms  used  in  the  translation  can 
yield  substantial  performance  benefits  for  very  small  cost  in  effort.  While  I  will  for  the  most  part 
defer  discussion  of  implementation  issues  until  part  2  of  this  thesis,  it  is  nonetheless  worth  pointing 
out  briefly  here  that  a  careful  theoretical  design  can  greatly  impact  the  ease  and  efficiency  of  the 
implement  ation . 

Using  these  definitions,  the  constructor  translation  itself  (figure  5.1)  is  almost  completely 
straightforward.  For  the  most  part,  the  translation  proceeds  compositionally  over  the  construc¬ 
tors  in  the  obvious  manner,  applying  the  static  encoding  constructors  at  the  leaves  of  the  syntax 
tree.  The  case  for  sum  types  is  more  interesting,  since  it  makes  the  representation  of  sums  as  tagged 
unions  explicit.  Note  also  that  the  known  sum  type  from  the  MIL  disappears.  Known  sums  are 
no  longer  necessary  because  the  tagging  has  been  made  explicit,  and  hence  the  de-structuring  of 
sum  values  can  be  done  via  record  selection,  instead  of  via  the  pro  j  primitive. 

The  most  syntactically  complex  rule  of  the  translation  defines  the  compulation  of  MIL  style 
recursive  types  into  LIL  style  parametric  recursive  types.  The  essential  idea  behind  this  translation 
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is  to  replace  uses  of  multiple  mutually  recursive  types  with  the  fix  point  of  a  single  parameterized 
constructor.  The  first  parameter  of  the  constructor  (as  usual)  is  the  recursive  variable.  The 
second  parameter  serves  as  a  selector  parameter  which  indicates  which  field  of  the  original  mutually 
recursive  type  is  desired.  The  translation  therefore  proceeds  by  translating  the  bodies  of  the 
recursive  types,  replacing  uses  of  the  multiple  mutually  recursive  variables  with  partial  applications 
of  the  new  recursive  variable  to  appropriate  sum  injections  indicating  the  selection  of  the  appropriate 
arm.  The  body  of  the  function  of  which  a  fixpoint  is  to  be  taken  uses  its  parameter  to  select  the 
appropriate  arm  of  the  recursive  type  to  return.  The  final  tuple  of  constructors  is  created  by 
instantiating  the  fixpoint  once  for  each  of  the  different  arms  of  the  original  constructor. 

5.2.3  Translation  of  typing  contexts 

The  kind  translation  extends  in  a  natural  way  to  define  a  translation  on  MIL  constructor  level 
typing  contexts. 

I  I  def 

1*1  =  • 

I  A  I  I  A  I  II 

5.2.4  Proofs  of  sonndness  for  the  constructor  and  kind  translation 
Lemma  4  (Well-formedness  of  Tmii) 

For  all  well  formed  A  such  that  the  bound  variables  Tmii  are  not  in  A, 


A  h  Tinil  ok 


Proof:  By  construction.  Note  that  for  any  given  A,  the  definitions  may  be  alpha-varied  such 
that  the  variable  condition  is  satisfied. 


Lemma  5  (Soundness  of  kind  translation  (1)) 

•  H  |k|  ok 

Proof:  By  induction  over  the  structure  of  k,  and  lemma  4. 

Corollary  1 

If  h  A  ok  then  A  h  |k|  ok. 

Proof:  By  lemma  5  and  weakening. 

Lemma  6 

If  a  ^  A'  then  a  ^  |A|. 

Proof:  Note  that  A  and  |A|  have  the  same  domain  by  construction. 

Lemma  7  (Soundness  of  the  context  translation) 

If  h  A  ok  then  h  |A|  ok. 

Proof:  By  induction  over  the  structure  of  A. 
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1.  If  A  =  •  then  |A|  =  •  and  h  •  ok 

2.  If  A  =  A',  a:K  then: 

By  assumption: 
h  A'  ok,  a  ^  A',  and  h  k  ok 

By  induction: 
h  |A'|  ok 

By  corollary  1: 

|A'|  h  |«:|  ok 

By  lemma  6: 
a  ^  |A'| 

By  construction: 
h  I  A'l,  a:\K\  ok 

By  definition: 
h  |A|  ok 

■ 

The  previous  result  is  used  to  prove  a  slightly  stronger  result  about  the  soundness  of  the  kind 
translation. 

Theorem  1  (Soundness  of  kind  translation  (2)) 

If  h  A  ok  then  |A|  h  |«:|  ok. 

Proof:  By  lemma  7,  h  |A|  ok,  so  by  corollary  1,  |A|  h  |«:|  ok. 

■ 

To  assist  in  the  proof  of  soundness  of  the  constructor  translation,  I  formalize  the  typing  prop¬ 
erties  of  the  definitions  in  the  following  lemma: 

Lemma  8  (Well-formedness  of  definitions) 

For  all  well  formed  A  such  that  the  bound  variables  of  the  respective  defined  forms  are  not  in  A, 

A  h  interp  :  Tmu  ^  T32 
A  h  Other  :T32  ^  Tmii 
A  b  BFloat  :  Tjnii 
A  b  Record  :  T32list  ^  Tmii 
A  b  Array  :  T^ii  ^  T32 
A  b  Vararg  :  Tmii  ^  T32  ^  T32 

Proof:  By  construction.  Note  that  for  any  given  A,  the  definitions  may  be  alpha-varied  such 
that  the  variable  condition  is  satisfied. 

■ 

Using  this  lemma,  the  soundness  of  the  constructor  translation  follows  straightforwardly. 

Theorem  2  (Soundness  of  the  constructor  translation) 

If  A  b  c:  K  then  |A|  b  |c|  :  |«:|. 

Proof:  By  induction  on  the  structure  of  typing  derivations. 
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1.  Suppose  A,a:K  a:  k.  Then  by  induction  h  |A,a:K|  ok.  By  definition  |A,a:K|  =  |A|,a:K, 
and  |a|  =  a,  and  so  by  construction  | A, a:K\  \-  a:  k. 

2.  Suppose  A  h  X{a::Ki).c:  ki  K2-  By  theorem  1,  A  h  |ki|  ok,  and  by  induction,  |A,a:Ki|  h 

|c|  :\k2\-  But  |A,a:Ki|  =  |A|,q;:|ki|,  and  by  lemma  6,  a  ^  |A|.  So  by  construction,  |A|  h 

I A(a::Ki).c|  :  |ki  ^  K2|- 

3.  Suppose  A  h  C1C2  :  K2-  By  induction,  |A|  h  |ci|  :  \ki  K2I  and  |A|  h  |c2|  :  |ki|.  Therefore  by 
construction  |A|  h  |ciC2|  :  |k2|. 

4.  Suppose  A  h  TTi  c:  Ki-  By  induction,  |A|  h  |c|  :  |ki  x  K2I.  So  by  construction,  |A|  h  |  tti  c|  :  |ki|. 

5.  Suppose  A  h  7r2  c :  K2-  By  a  similar  argument  to  the  previous  case,  |A|  h  |  vr2  c|  :  |/t2|- 

6.  Suppose  A  h  (ci,C2):ki  x  K2-  By  induction,  |A|  h  |ci|:|k;i|  and  |A|  h  |c2|:|/i2|-  So  by 
construction,  |A|  h  |(ci,C2)|  :  |ki  x  K2I 

7.  Suppose  A  h  Int  :T32.  Note  that  IT32I  =  Tmii,  so  it  suffices  to  show  that  |A|  h  |lnt|  rTmii- 
By  lemma  8,  |A|  h  Other  :T32  ^  Tmii,  and  by  the  Int  axiom,  |A|  h  Int  :T32.  So  by  the 
application  rule,  |A|  h  Other(lnt) :  Tmii. 

8.  The  other  primitive  type  constructors  follow  similarly  as  with  Int. 

9.  Suppose  A  h  /r(a, /3).(ci,  C2)  :T32  x  T32.  It  suffices  to  show  that 

|A|  h  (Other(reci+i(/,  inj^  *)),  Other(reci+i(/,  inj2  *)))  •  BjYiii  X  Tjjjji 
where/  =  +  1  ^  T32).A(ti;.l  +  1). 

case  u) 

inji  -  ^  interp(|ci|[Other(p(inji  *))/a,  Other(p(inj2  *))//3]) 

I  inj2  -  interpd  C2|[Other(p(inji  *))/a,  Other(/)(inj2  *))//3]) 

But  note  that  it  suffices  to  show  that  |A|  h  /  :(1  +  1^T32)^(1  +  1)^T32,  since  the  desired 
derivation  can  then  be  produced  by  applying  the  rec  rule  to  show  the  well-formedness  of  the 
new  recursive  types;  the  application  rule  and  lemma  8  to  show  the  well-formedness  of  the 
applications  of  Other,  and  the  pairing  rule  to  show  the  well-formedness  of  the  pair. 

By  induction,  |  A,  q;:T32, /3:T32|  b  |ci|  :  Tmu  and  by  weakening,  the  freshness  assumption,  and 
the  definition  of  the  context  translation  |A|,  arTmii, /3:Tmii,  p/l  -|-  1  ^  T32)  b  |ci|  :Tmii.  By 
the  substitution  lemma  for  the  LIL  (lemma  1): 

|A|,/):(1  +  1^  T32)  b  |ci|[Other(/9(inj^  *))/«,  Other(p(inj2  *))//?]  :Tmii 

By  the  application  typing  rule  therefore: 

|A|,p:(l  +  1^  T32)  b  interp(|ci|[Other(p(inj^  *))/«,  Other(p(inj2  *))/(5]):T^2 

A  similar  argument  holds  for  C2.  Therefore,  by  applying  the  typing  rule  for  case  and  for 
repeated  lambda  abstraction  (again  using  freshness  and  weakening  for  to),  we  construct  a 
well-formedness  derivation  for  /. 
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5.2.5  Additional  properties  of  the  constructor  translation 
Commutation  with  substitution 

It  is  an  additional  property  of  the  constructor  translation  that  it  commutes  with  substitution: 

i.e.  that  it  is  compositional.  This  property  is  necessary  for  the  proofs  of  a  number  of  subsequent 
theorems,  and  so  I  give  a  proof  of  it  here. 

Lemma  9  (The  constructor  translation  commutes  with  substitution) 

|c|[|c'|/a]  =  |c[cVa]|. 

Proof:  By  induction  on  c. 

1.  If  c  is  a  constant,  then  a  ^  /cc, /?;|c|,  so  |c|[|c'|/a]  =  |c|  =  |c[cYq;]|. 

2.  If  c  is  a  variable  a': 

(a)  If  a  /  a'  then  as  in  the  previous  case. 

(b)  If  a  =  a'  then  |Q;|[|c'|/a]  =  a[|c'|/a]  =  \c'\  =  |q;[cYq;]|. 

3.  If  c  =  A(/3:k).C2,  then  by  definition 

|A(/3:K).C2|[c7a]  =  (A(/3:|«:|).|c2|)[c7a]  =  A(/3:|K|).(|c2|[c7a]) 

By  induction  and  the  definition  of  substitution: 

A(7:7|).(|c2|[c7a])  =  A(/3:|K|).(|c2[c7a]|)  =  |  A(/3:«:).(c2[c7a])| 

Note,  we  rely  on  alpha-variance  to  ensure  non-capture. 

4.  If  c  =  ciC2,7riC,  (ci,C2),  the  proof  proceeds  similarly. 


The  SE  translation  respects  equivalence 

Another  important  property  of  the  SE  translation  is  that  equivalent  MIL  constructors  translate 
to  equivalent  LIL  constructors.  In  addition  to  being  important  for  subsequent  proofs,  this  lemma 
is  an  important  “sanity-check”  on  the  translation. 

Lemma  10  (The  constructor  translation  respects  equivalence) 

If  A  \-  Cl  =  C2  :  K,  then  |A|  h  |ci|  =  |c2|  :  |k|. 

Proof:  (By  induction  on  equivalence  derivations)  All  of  the  structural  and  type  constructor  equiv¬ 
alence  rules  are  unchanged  from  the  MIL  to  the  LIL,  and  the  proof  follows  straightforwardly 
by  induction.  I  give  the  reflexivity  rule,  the  pair  rule  and  the  beta  rule  as  examples.  All  of  the 
primitive  types  follow  directly  by  reflexivity  or  by  the  structural  application  rule. 

1 .  Suppose  A  h  c  =  c :  K  by  reflexivity. 

By  assumption: 

A\-  c:  K 


40 


By  theorem  2: 

|A|  h  |c|  :  \k\ 

By  reflexivity: 

I A|  h  |c|  =  |c|  :  |k| 

2.  Suppose  A  h  (A(a:Ki).ci)(c2)  =  ci[c2/a]  :  K2- 

By  assumption: 
h  Ki  ok 
A  ,  a:Ki  h  Cl :  K2 
A  h  C2  : 

By  theorems  1,  and  2: 

|A|  h  |ki|  ok 
|A,a:«;i|  h  |ci|  :  \k2\ 

|A|  h  |c2|  :  |ki| 

By  the  A  beta  rule  and  the  definition  of  the  translations: 
|A|  h  |(A(q;:«:i).ci)(c2)|  =  |ci|[|c2|/a]  : \k2\ 

Finally,  by  lemma  9: 

|A|  h  |(A(a:/ti).ci)(c2)|  =  |ci[c2/a]|  : \k2\ 


5.3  Dynamic  encoding  of  constructors 

The  two  translations  given  above,  |k|  and  |c|,  define  the  kind  of  the  static  encoding  of  a  constructor 
and  the  encoding  itself,  respectively.  The  second  element  of  the  mapping  from  MIL  to  LIL  is  giving 
dynamic  encodings  for  constructors  in  order  to  permit  the  move  from  a  type-passing  interpretation 
to  a  type  erasure  interpretation. 

A  constructor’s  dynamic  encoding  is  a  LIL  term  which  also  encodes  the  same  information  about 
the  original  MIL  constructor,  but  at  the  term  level  instead  of  at  the  constructor  level.  Dynamic 
encodings  are  used  to  implement  type  dispatch  at  runtime.  This  section  defines  the  dynamic 
encoding  translation. 

5.3.1  Dynamic  encoding  types 

The  dynamic  encoding  translation  of  a  constructor  c  is  notated  as  Jcl,.  (The  notation  is  intended 
to  be  suggestive  of  the  fact  that  the  translation  moves  “down”  a  level,  from  constructors  to  terms.) 
The  type  of  the  dynamic  encoding  of  a  constructor  is  driven  by  its  kind:  constructors  of  pair 
kind  get  represented  by  terms  of  pair  type,  etc.  However,  the  type  of  the  dynamic  encoding  of 
a  constructor  also  depends  on  its  static  encoding:  this  is  what  captures  the  connection  between 
static  and  dynamic  encodings.  For  a  MIL  kind  k  classifying  a  constructor  c,  I  notate  the  dynamic 
encoding  type  for  the  constructor  as  \c:kI. 

Jc:T32l  i?(|c|) 

\c:ki^K2[  =  y[a:\Ki\]{\a:Ki[)^  \ca:K2[ 

def 

Jc:ki  X  K2I  =  JvTi  X  J7r2  c:k21 
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The  DET  translation  is  defined  in  terms  of  a  type  level  function  R  of  type  T^ii  ^  T32.  This 
function  is  defined  shortly,  but  for  the  sake  of  understanding  the  translation  it  is  easier  to  view  this 
abstractly  as  the  type  of  representations  of  constructors  of  kind  T32. 

For  constructors  of  pair  kind,  the  DET  is  straightforward:  a  pair  type  with  fields  constructed 
by  projecting  out  the  two  halves  of  the  pair  and  applying  the  DET  translation  to  each  of  them  at 
the  sub-component  kind. 

Constructor  functions  are  slightly  more  interesting.  The  essential  idea  is  that  constructor  func¬ 
tions  (functions  from  types  to  types)  will  become  term  functions  (functions  from  terms  to  terms). 
Accordingly,  the  DET  of  a  constructor  of  arrow  kind  is  a  term  level  function  type.  However,  note 
that  this  requires  us  to  construct  the  DET  of  the  argument  type  and  the  result  type  of  the  function, 
which  in  turn  requires  us  to  have  the  static  encoding  of  the  argument  to  the  type  function  (which 
of  course  is  not  yet  available).  The  solution  to  this  dilemma  is  to  observe  that  the  DE  function 
in  question  must  be  polymorphic  over  all  possible  static  encodings:  the  DET  of  the  argument  and 
result  can  then  be  given  in  terms  of  the  eventual  encoded  type  passed  to  the  function.  This  point 
is  essential  to  the  methodological  goal  of  maintaining  the  connection  between  static  encodings  and 
dynamic  encodings. 

So  far,  I  have  avoided  discussing  the  particulars  of  the  actual  encodings  of  types  (that  is,  of 
constructors  of  kind  T32).  Here  what  is  required  is  something  of  the  nature  of  the  representation 
types  of  Xr  [CWM98]:  the  type  of  representation  of  a  specific  type.  Unlike  in  X^,  these  types  are 
not  primitive  here.  Instead,  they  are  programmed  directly  in  the  LIL  in  the  following  fashion. 

i?:Tmii^T32  =  A(a:Tinii). 

case  a 

(Record  (3  ^ 

(case  (3  ([]  ^  Unit  |  _  Void) 

-|-  case  /3  ([_] ,  ^  Unit  |  _  Void) 

-|-  case  (3  ([_,  _]  ^  Unit  |  _  Void) 

-|-  case  /3  (_  ^  Unit  |  _  ^  Void) 

I  _  =>  Void) 

-|-  case  a  (BFloat  Unit  |  _  Void) 

-|-  case  a  (Other  _  Unit  |  _  Void) 

This  definition  is  somewhat  subtle.  At  the  top  level,  the  representations  of  types  are  always 
sums,  indicating  whether  the  type  being  represented  is  a  record,  boxed  float,  or  an  undistinguished 
type.  The  value  carried  in  each  branch  of  the  sum  serves  as  a  witness  to  the  identity  of  the  original 
type.  So  for  example  in  the  boxed  float  case  the  carried  value  will  have  type  case  a  (BFloat  _  ^ 
Unit  I  _  Void).  Since  the  Void  type  is  uninhabited,  having  a  value  of  this  type  means  that  a 
can  only  be  the  static  representation  of  Boxed(Float).  This  information  can  be  reflected  back  into 
the  type  system  via  the  special  vcase  construct,  whereby  code  can  use  this  witness  to  refine  a. 

5.3.2  Notational  issues 

Unfortunately,  the  named- form  syntactic  restrictions  imposed  on  LIL  terms  add  a  level  of  inessential 
complexity  to  the  dynamic  encoding  translations.  Constructors  in  the  MIL  are  not  in  named-form: 
in  fact,  in  the  absence  of  singleton  kinds  or  some  other  definitional  mechanism  there  is  in  general  no 
equivalent  named-form  for  an  arbitrary  constructor.  This  means  that  translating  MIL  constructors 
into  LIL  terms  requires  the  dynamic  encoding  essential  to  the  translation  to  occur  simultaneously 
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Figure  5.2:  Derived  non-named-form  expressions 


A;  r  h  ei :  Ti  exp 

A;  r  h  62  :  r2  exp  A;  F  h  e  :  tq  x  n  exp  {i  G  0, 1) 

'F;  A;  r  h  (ei,  62) :  D  x  r2  dexp  'I';  A;  F  h  select*  e  :  Tj  dexp 

A  h  c :  K 

A  h  c  =  -h[ri, . . . ,  x[Tag(i),ri], . . .  ,rn]  :T32  4';  A;F  h  ei :  V(a:K).ri  ^  T2  exp 

'F;  A;  F  h  6  :  Tj  exp  (i  G  1 . . .  n)  'F;  A;  F  h  62  :  Ti[c/a]  exp 

'F;  A;  F  h  inj?  e  :  c  dexp  'F;  A;  F  h  ei[c]e2  :  r2[c/a]  dexp 

Figure  5.3:  Typing  rules  for  derived  (non-named-form)  expressions 
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Jal 

\X{a:K).ci 

JC1C2I 

JVTiCt 

J(cl,c2)t 

jUnitt 

J  X  [c]  t 

JCl  X  C2t 

Jx[ci,C2,...,C„]t 
jBoxf t 

Jcas^(a,/3).(ci,C2)L 
Jc  otherwise[ 


def 

=  Xa 

A[a:|K|](xQ,:  Jet 

=  Jcil[|c2|]aC2t) 

=  selectj  Jet 

(JClt,JC2t) 

def  .  .ij(|Unit|) .  .  . 

def  .  ._R(|ciXC2|)/ .  . 

def  .  ._R(|x[ci,C2,...,c„]|)  / .  . 


Figure  5.4:  The  dynamic  encoding  translation 


with  a  mostly  un-interesting  “lifting”  process  which  pulls  bindings  out  of  expressions  and  returns 
them  to  named  form. 

In  order  to  isolate  the  core  issues  of  the  translation  from  these  syntactic  issues,  it  is  useful  to 
define  derived  syntactic  forms  (figure  5.2)  that  allow  the  translation  to  make  use  of  apparently 
arbitrary  expressions  without  concerning  itself  with  named  form.  Derived  typing  rules  (figure  5.3) 
for  the  extended  syntax  with  an  accompanying  proof  of  soundness  (section  5.3.5)  permit  the  free 
use  of  the  extended  syntax  in  the  translation. 

Only  a  small  number  of  derived  expression  forms  are  needed  for  the  purposes  of  the  dynamic 
encoding  translation:  additional  definitions  are  possible.  All  variables  are  assumed  to  be  fresh  in 
the  translation:  this  guarantees  that  variables  will  not  conflict  as  bindings  are  “lifted” .  This  could 
also  be  dealt  with  by  explicitly  alpha-varying  terms  as  necessary. 

5.3.3  Dynamic  encodings 

The  term- level  encoding  translation  of  constructors  is  given  in  figure  5.4  and  is  for  the  most  part 
fairly  intuitive.  Note  that  the  translation  takes  advantage  of  the  defined  forms  from  section  5.3.2 
for  presentational  clarity:  without  this,  it  is  necessary  for  the  translation  to  handle  explicitly  the 
lifting  of  bindings.  In  point  of  fact,  this  sort  of  issue  arises  in  many  parts  of  the  implementation  of 
the  translation  from  MIL  to  LIL  and  is  handled  using  a  monadic  structure  similar  to  the  defined 
syntax  used  here. 

For  the  purposes  of  the  translation  I  assume  an  unspecified  anti-symmetric  injection  from  type 
variables  into  a  set  of  term  variables  disjoint  from  those  produced  elsewhere  in  the  translation.  I 
refer  to  the  term  variables  produced  by  this  injection  informally  as  being  “indexed”  by  the  type 
variable.  Note  that  the  translation  does  not  rely  on  the  ability  to  recover  the  original  type  variable 
from  the  indexed  term  variable. 

Type  functions  become  term  functions  which  take  both  the  static  and  dynamic  representations 
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of  the  argument  and  return  the  dynamic  representation  of  the  result.  Applications  are  modified  in 
the  corresponding  fashion.  Type  level  pairs  and  projections  map  to  term  level  pairs  and  projections. 
For  constructors  of  kind  T32,  it  is  necessary  to  construct  the  type  representations  by  injecting  the 
appropriate  witnesses  into  the  sum  type  described  above. 


5.3.4  Dynamic  encoding  of  type  contexts 

Since  the  DE  translation  is  defined  on  open  terms,  the  question  naturally  arises  of  what  the 
appropriate  notion  of  typing  context  for  dynamic  encodings  should  be.  Just  as  the  DE  translation 
maps  constructors  to  types,  the  corresponding  context  translation  maps  constructor  contexts  to 
term  contexts  using  the  DET  translation  defined  above.  Again,  I  use  the  down  arrow  syntax  (J-j.) 
to  suggest  the  movement  “down”  a  level  in  the  syntactic  hierarchy. 


def 


def 


A,a:Ki  =  \A[,Xa'- 


Note  that  the  types  produced  by  the  DET  translation  will  contain  references  to  the  type  variables 
from  the  original  context:  that  is,  the  context  produced  by  the  translation  will  have  free  type 
variables.  These  variables  are  described  by  the  context  produced  by  the  static  encoding  of  the 
original  context:  that  is,  |A|  l“JA[,  ok. 


5.3.5  Proofs  for  the  dynamic  encoding  translations 
Proof  of  soundness  of  the  DET  translation 

I  begin  by  showing  the  well-formedness  of  the  definition  of  representation  types. 

Lemma  11  (Well-formedness  of  R) 

For  all  well  formed  A  such  that  the  bound  variables  R  are  not  in  A,  Ah  R  :  Tmii  ^  T32. 

Proof:  By  construction.  Note  that  for  any  given  A,  the  definition  may  be  alpha-varied  such  that 
the  variable  condition  is  satished. 


Using  this  and  the  soundness  theorems  for  the  SE  and  SEK  translations,  I  show  the  soundness 

of  the  DET  translation. 

Lemma  12  (Soundness  of  the  dynamic  encoding  type  translation) 

If  Ah  c:  K  then  |  A|  h\c-.K[  :  T32. 

Proof:  By  induction  on  kinds  k. 

1.  Suppose  n  =  T32.  Then  by  definition,  \c:k[=  i?(|c|).  By  theorem  2,  |A|  h  |c|  :  Tmii,  and  so  by 
lemma  11  and  the  application  typing  rule,  |A|  h  ii(|c|)  :T32. 

2.  Suppose  K  =  Ki  ^  K2-  Then  by  definition,  \c:kI=  V[a:|Ki|](J  a:Ki  |.)^  J  ca:K2  I-  It  suffices 
to  show  that  this  is  well- formed  via  the  formation  rule  for  universals.  By  theorem  1,  |A|  h 
|ki|  ok.  By  the  variable  rule,  A,a:Ki  h  a:Ki,  and  so  by  induction,  |A,q;:ki|  h\a:Ki[  :T32. 
Again  using  the  variable  rule  along  with  the  assumption,  we  obtain  that  A, a:Ki  h  ca:  K2  by 
the  application  rule,  and  so  by  induction,  |A,a:«:i|  h\ca:K2[  :T32. 
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We  therefore  may  invoke  the  formation  rule  for  universals  to  produce  a  derivation  of  the 
desired  result. 

3.  Suppose  K  =  K2-  Then  by  definition,  Jc:Kl.=j7ri  c:ki[  x  \t^2  c:k2[-  It  suffices  to  show  that 

this  is  well-formed  via  the  formation  rule  for  pair  types  by  showing  that  the  two  component 
types  are  well-formed.  This  follows  immediately  by  induction. 


As  with  the  SE  translation,  the  fact  that  the  DET  translation  commutes  with  substitution  is 
useful  for  subsequent  proofs. 

Lemma  13  (The  DET  translation  commutes  with  substitution) 

\c:k[  [|c'|/a]  =\c[c' / a\\K[. 

Proof:  By  induction  on  kinds  k. 

1.  If  K  =  T32,  then  \c:k[  =  i?(|c|).  Since  R  is  closed,  i?(|c|)[|c'|/a]  =  i?(|c| [|c'|/q;]).  By  lemma  9, 
|c|[|c'|/a]  =  |c[cYq;]|.  Finally,  by  definition,  i?(|c[c'/Q;]|)  =Jc[c'/a]:T32t. 

2.  If  K  =  Ki  ^  K2  then  by  definition 

Jc:ki  ^  K2[  [\c'\/a\  (V[/3:|ki|](J/3:ki[,)  ^  {\cf3-.K2[))[\c'\/ a] 

V[/3:|ki|](J/3:ki1,)  ^  (Jc/3:k21  [\c'\/a\) 

Note  that  a  ^  (3  hy  the  assumption  of  freshness  in  the  translation,  and  hence 

\I3:ki[  [\c\/a\  =\j3:ni[ 

By  induction,  Jc/3:k21  [|c^|/«]  =J (c[c'/q!])/3:k2 L  again  using  the  fact  that  a  ^  (5.  Finally,  we 
observe  that 

V[/3:|ki|](J/3:ki1,)  ^  {\{c[c’ /a])P:K2i)  =\c[c! /a]:Ki  K2L 

3.  If  K  =  X  K2  then  by  definition, 

Jc:ki  X  K2[  [|c'|/«]  (JvTlCIKll,  X  j7r2C:K2l)[|c'|/«] 

(JvriciKit  [|c'|/a])  X  (j7r2c:K2l  [|c'|/a]) 

By  induction,  (JvriCiKjl,  [|c'|/q;])  =j7rjc[c7a]:^Cjl)  so 

(jTTiCKit  [|c'|/a])  X  (j7r2c:K2l  [\c'\/a\)  j7ric[cVa]:«^il  x  j7r2c[c7a]:K2l 

Jc[c7a]:Ki  X  K2[ 
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Proof  of  soundness  of  the  DE  typing  context  translation 

Lemma  14 

If  a  ^  A  then  Xa 

Proof:  By  assumption,  the  injection  from  type  variables  to  term  variables  is  anti-symmetric  with 
respect  to  variable  equality.  Therefore,  by  anti-symmetry,  if  /3  /  a  then  x/j  7^  Xq.  Since  the  set  of 
variables  in  the  domain  of  A  does  not  contain  a,  the  image  of  the  set  of  under  the  injection  does 
not  contain  Xq,. 

■ 

Lemma  15  (Soundness  of  the  context  translation) 

If  h  A  ok  then  |A|  bJAt  ok. 

Proof:  By  induction  over  the  structure  of  A. 

1.  If  A  =  •  then  JA1,=  •  and  •  b  •  ok 

2.  If  A  =  A',a:K  then  by  assumption,  h  A'  ok,  a  ^  A',  and  h  k  ok.  By  the  variable  rule, 
A',a:K  h  a:K,  and  so  by  lemma  12,  |A|,a:|K;|  \-\a:K[  1X32.  By  induction,  |A'|  bJA'l,  ok, 
and  by  weakening  and  lemma  6  |A'|,q;:|k|  bJA'[,  ok. 

Finally  by  the  formation  rule  for  contexts  |A'|,a:|K|  l-JA'l.,Xa:  \a:K[  ok.  Note  that  the  side 
condition  follows  from  lemma  14. 


Proof  of  soundness  of  the  DE  translation 

Lemma  16  (Inversion  of  derived  expression  derivations) 

For  any  derivation  D  of  T;  A;  F  h  e  :  r  dexp  the  last  rule  of  D  is  uniquely  determined  by  e. 

Proof:  By  inspection.  Note  that  for  any  choice  of  e,  exactly  one  rule  applies.  ■ 

Since  the  DE  translation  relies  on  the  extended  syntax  defined  in  section  5.3.2,  is  is  first  neces¬ 
sary  to  show  that  well-typedness  as  a  derived  form  corresponds  with  well-typedness  of  expressions. 

Lemma  17  (Derived  expression  rules) 

If A;  F  h  e  :  T  dexp  then  'k;  A;  F  h  e  :  r  exp. 

Proof:  By  induction  on  the  definition  of  the  derived  forms.  The  proof  is  straightforward:  I  give 
here  one  example  case  in  detail. 

1.  'k;  A;  F  h  (ei,  62) :  ti  x  T2  dexp  By  inversion  (lemma  16,  'k;  A;  F  h  ei :  ri  exp  and  'k;  A;  F  h 
62  :  T2  exp.  There  are  three  possible  cases  in  the  definition  of  the  derived  form. 

(a)  61  =  sr’1,62  =  SV2-  Then  {svi,sv2)  letx  =  {svi,sv2)  inx,  which  is  well-formed  by 
construction  using  the  assumptions  and  the  freshness  of  x. 

(b)  61  =  5X1,62  =  letx  =  iin62.  Then  (5x1,62)  letx  =  1111(5x1,62).  By  inverting 
the  second  assumption  (3),  'k;A;F  h  i:Tx  opXgj  and  'k;A;F,x:rx  b  62  ■T2  exp.  By 
weakening  the  first  assumption  (lemma  2),  'k;  A;F,x:r2;  b  5x1 :  ri  exp.  By  the  derived 
typing  rule  for  pairs  'k;A;F,x:ra;  b  (5Xi,e2):ri  x  T2  dexp,  and  hence  by  induction 
'k;  A;F,x:ra;  b  (5x1,62) :  ri  x  T2  exp.  Finally,  by  construction  using  the  operation  rule, 
41;  A;  F  b  let  x  =  i  in  (5x1,  62)  :  ri  x  r2  exp. 
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(c)  ei  =  let  X  =  i  ine'^,  62  =  SV2-  This  case  proceeds  exactly  as  the  previous  case. 


2.  The  other  cases  proceed  similarly. 


Finally,  I  show  the  soundness  of  the  dynamic  encoding  translation,  using  the  soundness  theorems 
for  the  SEK,  SE,  and  DET  translations,  along  with  the  commutativity  lemma  for  the  DET 
translation  and  the  soundness  lemmas  for  contexts. 


Theorem  3  (Soundness  of  the  dynamic  encoding  translation) 

If  Ah  c:  K  and  h  'h  ok  then  d';  |  A|;  J  A[,l-Jcl.  :  \c:kI  exp. 

Proof:  By  induction  on  derivations  of  A  h  c :  k. 


1.  A[a:«;]  h  a:  k. 

By  assumption: 
h  A[q;:k]  ok 

h  'k  ok 


By  lemmas  7  and  15: 
h  |A[a:«:]|  ok 
|A[a:K]|  hj A[Q;:fi:]  1.  ok 


By  definition: 

)«(=  Xa 

JA[q;:k]1,=^JA1,  [xa-  \a\K[] 


Hence  by  the  variable  rule: 

'k;  |A[a:«;]|;  JAt  [xq,:  :  \of.K[  exp 


2.  A  h  A(a:Ki).c :  Ki  ^  K2- 

By  assumption: 
h  Ki  ok,  a  ^  fv{A) 

A, a:Ki  h  c:  K2 

By  theorem  1  and  lemma  12: 

|A,  a:Ki\  h  |k|  ok 
|A,  a-.Ki\  l-Ja:Kil,  :  T32 

By  induction: 

'k;  |A,  q;:ki|;  JA,  q;:k1,|-Jc1,  :  Jc:k21  exp 

So  by  the  function  introduction  rule: 

'k;  |A|;  JAl,l-JAa:Ki.cl,  :  \\a-.Ki.c-.Ki  ^  K2[  exp 


3.  A  h  C1C2  :  K2. 

By  assumption: 

A  h  Cl :  ^  ^2 

A  h  62  :  Ki 
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By  induction: 

|A|;  JAthJcit  :  \ci:ki^K21  exp 
|A|;  JA1,I-Jc2t  :  Jc2:Kit  exp 

By  theorem  2: 

|A|  h  |c2|  :  |ki| 

def 

Note  that  Jci:ki  ^  K2i=  V[a:|Ki|](Ja:Kit)^  \cia-.K2[. 

By  lemma  13: 

\a\Ki[  [\c2\/a\  =Jc2:«:it 
\cia\K2[  [\c2\/a]  =JciC2:k21 
(Note  a  ^  fv{ci)  by  freshness). 

Therefore,  by  the  derived  rule  for  application  of  expressions: 

'h;  |A|;  JAth  C1C2  :  JciC2:K2t  dexp 

Finally,  observe  that  by  lemma  17: 

|A|;  JAth  C1C2  :  JciC2:K2t  exp 

4.  A  h  TTic :  Ki- 

By  assumption: 

A  h  c :  Ki  X  K2 

By  induction 

'h;  |A|;  JAtl-Jct  :  Jc:«:i  x  K2i  exp 
By  definition: 

Jc:ki  X  K2[=\'KlC-.Kl[  X  j7r2C:K2l 

By  the  typing  rule  for  the  select*  e  derived  form: 

'h;  I A|;  JAth  select^  Jet  :  j7ric:Kit  dexp 

By  lemma  17: 

'h;  |A|;  JAth  select^  Jet  :  j7ric:Kit  exp 

5.  A  h  7r2C :  K2-  The  proof  proceeds  as  in  the  previous  case. 

6.  A  h  (ci,  C2) :  Ki  X  K2 

By  assumption: 

A  h  Cl : 

A  h  C2  :  K2 

By  induction 

4';  |A|;  JAthJcit  :  JcpKit  exp 
4';  |A|;  JAtl-Jc2t  :  Jc2:K2t  exp 

By  the  derived  pair  introduction  rule: 

4';  |A|;JAtl-  (Jcit,Jc2t)  :  JcpKit  x  Jc2:k21  dexp 
By  theorem  17: 

4';  |A|;JAtl-  (Jcit,Jc2t)  :  JcpKit  x  Jc2:k21  exp 
By  the  definitions  of  the  translations  and  the  beta  rule  for  pairs: 
4^;  |A|;  JAtl-J(ci,C2)t  :  J(ci,  C2):ki  x  K2I  exp 
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7.  The  rest  of  the  cases  follow  directly  by  construction,  with  appeals  to  previous  lemmas  where 
necessary. 


5.4  Type  translations 

5.4.1  Proper  MIL  types 

The  DE  and  SE  translations  complete  the  apparatus  needed  to  translate  proper  MIL  constructors. 
Constructors  of  kind  T32  map  to  constructors  of  kind  Tmii,  which  permits  analysis  of  the  form  of  the 
original  constructor.  However,  since  proper  types  are  not  analyzable  in  the  MIL  (only  constructors 
of  kind  T32  can  be  analyzed)  the  translation  of  proper  MIL  types  does  not  map  into  the  Tmii  kind, 
but  instead  goes  directly  to  kind  T32. 

\y[ai:Ki,  .  .  .  ,  an-.Kn]{Tl,  .  .  .  ,Tm){k)  ^  t\  ‘^= 

V[q;i  .  I  I ,  .  .  .  ,  Oln-\  I] 

(JaiiKit, . . .  \an.Kn[,  |ti|,  . . . ,  | ) (Floato, . . .  ,Floatfc_i)  ^  |r| 

|T(c)|  interp|c| 

|rix...xrn|  ‘^=  x[|ri|, . . . ,  |rn|] 

The  most  interesting  piece  of  this  translation  is  the  treatment  of  polymorphic  functions.  Additional 
arguments  corresponding  to  the  dynamic  representations  of  the  type  arguments  are  added  to  the 
parameter  list  of  the  function.  Other  types  are  translated  compositionally.  Notice  though  that  the 
translation  of  a  constructor  used  as  a  type  is  the  interpretation  of  the  translation  of  the  constructor. 
This  reflects  the  fact  that  proper  MIL  types  are  not  available  for  analysis  -  they  simply  serve  as 
classifiers. 

5.4.2  The  term  typing  context  translation 

The  type  and  kind  translations  can  be  used  to  give  a  definition  of  the  translation  of  a  MIL  typing 
context  in  the  obvious  manner. 


I  I  def 

•  =  • 

I T'  I  dsf  I T'  I  I  I 

|i  , x:t\  =  |i  I, x:|r| 

,  ,  Hpf  ,  , 

|r,a;64|  =  |r|,  X64:Float 

5.4.3  Proofs  of  soundness  for  the  proper  type  translations 

Theorem  4  (Soundness  of  the  type  translation) 

If  A\-  T  ok  then  |  A|  h  |r|  :  T32 

Proof:  By  induction  on  r.  We  proceed  by  cases: 


A  h  r(c)  ok 

By  assumption: 
A  b  c:  T32 
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By  theorem  2: 

|A|  H  |c|:|T32| 

Note  that  IT32I  =  Tmii 

By  lemma  8: 

|A|  h  interp  :Tmii  ^  T32 

By  the  application  rule: 
|A|  h  mterp(|c|)  1X32 


•  A  h  Tl  X  .  .  .  X  Tn  ok 

By  assumption: 

A  h  Ti  ok . . .  A  h  Tn  ok 

By  induction: 

|A|  h  In  I  :  T32  ...|A|l-|Tn|:  T32 
By  construction: 

|A|  h  x[|ti|,  •  •  • ,  |rn|]  okT32 


•  Ah  V[ai:Ki, . . .  ,an:Kn]{Ti, . .  .,Tm){k)  ^  r  ok 

By  assumption: 

A  h  Kj  ok  i  G  1 . . .  n 

A[ai:Ki, . . . ,  an-Kn]  h  n  ok  i  G  1 . . .  m 

A[ai:Ki, . .  .,an-Kn]  h  r  ok 

By  theorem  1  and  lemma  12: 

I A|  h  Ki  ok  i  G  1 . .  .n 

|A[ai:«;i, . . .  ,an-.Kn]  \  \-\ai:Ki[  :T32  i  G  1 . .  .n 
By  induction: 

|A[ai:«:i, . .  .,an:Hn]  \  h  Ini  :  T32  i  G  1 . .  .m 
|A[ai:/ti, . . .  ,an.Kn]\  h  |r|  :  T32 

By  the  function  type  introduction  rule: 

I  I  |_  /  '^[oi-l^ll)'--)  Oln-\  I] 

1  (JopKit, . . .  ,Jan:Knl,  |ti|,  . . . ,  |rm|)(Floati  . .  .Floatfc)  ^  |r 


:  T32 


Lemma  18 

1.  If  X  then  x  ^  |r|. 

2.  If  xq4  ^  r  then  xq4  ^  |r|. 

Proof:  Note  that  T  and  |r|  have  the  same  domain  by  construction. 

Lemma  19  (Term  context  translation) 

If  A  h  r  ok  then  |A|  h  |r|  ok. 
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Proof:  By  induction  on  derivations,  using  the  domain  lemma  (lemma  18),  the  soundness  lemma 
for  the  constructor  translation  (lemma  7)  and  the  soundness  theorem  for  the  type  translation 
(theorem  4). 


Corollary  2 

If  A  h  r  ok  then  |A|  hj  Al.,  |r|  ok. 

Proof:  By  induction  on  B,  using  lemmas  18,  19  and  15.  Note  that  we  assume  that  indexed 

variables  Xa  are  drawn  from  a  “fresh”  set,  and  hence  there  is  no  interference  between  the  domains 
of  JAt  and  |r|. 


5.4.4  The  type  translation  respects  equivalence 

Lemma  20  (The  type  translation  respects  equivalence) 

If  A\-  Ti  =  T2  :T32,  then  |A|  h  [ril  =  \t2\  :T32. 


Proof:  By  induction  on  equivalence  derivations,  appealing  to  lemma  10  and  the  structural  equiv¬ 
alence  rule  for  application  in  the  base  case  (T(c)). 


5.5  The  term  translation 

The  translation  from  MIL  terms  to  LIL  terms  is  not  much  more  complicated  than  the  constructor 
translation,  but  is  syntactically  somewhat  more  cumbersome.  In  order  to  ensure  that  the  ap¬ 
propriate  type  information  is  available  for  constructing  LIL  terms,  it  is  convenient  to  phrase  the 
translations  on  terms  as  typed  translations  of  the  form  A;  F  h  e :  r  e'  indicating  that  in  the 
typing  context  A;  F,  a  MIL  term  e  of  type  r  translates  to  a  LIL  term  e'. 


5.5.1  Definitions  for  the  term  translation 

In  order  to  more  concisely  state  the  translation  itself,  I  first  define  a  number  of  LIL  functions 
implementing  type  analysis  primitives.  In  a  practical  implementation,  it  may  be  desirable  to  inline 
some  or  all  of  these  functions:  for  the  most  part  however  this  is  purely  a  policy  decision  trading  off 
code  size  against  function  calls.  Note  though  that  in  general  it  is  not  always  possible  to  eliminate 
the  lambda  abstraction  since  the  type  analysis  mechanism  requires  that  certain  types  be  a  variable. 
In  the  case  that  the  type  to  be  analyzed  is  an  application  of  a  variable,  the  lambda  abstraction 
cannot  be  eliminated. 

The  first  defined  form  is  the  optimized  array  constructor,  essentially  as  described  in  the  example 
from  chapter  4.  I  again  use  ML  style  pattern  matching  at  the  term  level  for  clarity  of  presentation. 
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Note  that  the  definitions  are  closed  and  hence  can  be  written  directly  as  code. 


def 

array  = 

code[a:Tmii](A(rQ,:  R{a) ,  i:lnt ,  v:  interp(Q;))():Array(a). 
case  Tq, 

Record  X  ^  vcase  a  (Record  _  array r)  |  _  ^  deadx) 

I  B  Float  X  vcase  a  (BFloat  _  arraypj^Q^^(f,  unbox  r))  |  _  deadx) 
I  Other  X  ^  vcase  a  (Other  _  array;jj^.gj,p(Q,)(i,  r))  |  _=^deadx) 


Here  the  DE  argument,  is  the  dynamic  encoding  of  the  constructor  represented  by  a.  As  can 
be  seen  from  the  definition  of  R,  the  type  of  Tq,  is  therefore  a  term-level  sum  describing  the  top 
level  structure  of  the  encoded  type.  The  appropriate  branch  is  thereby  chosen  via  an  ordinary  case 
statement  and  evaluated.  The  vcase  serves  to  refine  the  type  argument  to  reflect  the  identity  of 
the  type  back  into  the  type  system. 

In  addition  to  the  array  creation  function,  I  define  specialized  update  and  subscript  functions 
for  the  optimized  arrays. 


,  def 

upd  = 

code[a:Tmii](rQ,:  R{a),a:  Array(Q;),  i:Int,  v:  interp(a))():Unit. 
case  Tq 

Record  x  ^  vcase  a  (Record  _  upd;^^gj.p(g^)  {a,i,v)  \  _  dead  x) 

I  BFloat  X  ^  vcase  a  (BFloat  _  updpj^Q^^(a,  i,  unbox  x))  |  _=^deadx) 
I  Other  X  ^  vcase  a  (Other  _  updjjj^.gj,p(Q,)  (a,  f,  x))  |  _^deadx) 


,  def 

sub  = 

code[a:Tmii](ra:  R{a),a:  Array(Q;),  hlnt)():  interp(a). 
case  Xq- 

Record  x  ^  vcase  a  (Record  _  subinterp(«)  («)  I  -  dead  x) 

I  BFloat  X  ^  vcase  a  (BFloat  _  ^  box(subFioat(a,  ^))  |  -^deadx) 
I  Other  X  ^  vcase  a  (Other  _  ^  subjnterp(o)(0)  ^))  I  -  deadx) 


Notice  that  all  of  the  array  primitives,  in  the  case  that  the  array  is  a  flattened  float  array,  will 
perform  boxing  or  unboxing.  If  the  type  is  statically  known,  both  the  branching  and  the  boxing 
can  be  optimized  away,  but  in  general  they  cannot  be  avoided.  This  makes  it  very  important  to 
recognize  statically  reducible  uses  of  type  dispatch. 

The  vararg  and  onearg  primitives  from  the  MIL  correspond  in  a  similar  way  to  the  defined 
vararg  and  onearg  functions  in  the  LIL.  Notice  that  as  with  the  Vararg  type  function,  I  choose 
to  make  vararg  polymorphic  over  the  encoding  of  the  argument  type,  but  the  interpretation  of  the 
encoded  return  type  (that  is,  the  return  type  itself). 
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def 

vararg  = 

code[a:Tmii,/9:T32](rQ,:  R{a),  f:{mteip (a)  /9))():  Vararg(a)(p). 

case  Tai 

Record  y  vcase  a  ( 

Record  a'  ^  case  y 
I  injQ  X  vcase  a' 

([]^A().(/()) 

I  _  =>  deadx) 

I  inj  ^  X  vcase  a' 

([/3]  ^  \{y:P).{f{y)) 

I  _  =>  deadx) 

I  inj 2  X  ^  vcase  a' 

{[i3i,P2]  =>  Kyi-Pi:y2-P2)-{f{yi,y2)) 

I  _  =>  deadx) 

I  / 

I  _  =>  deady) 

1-^/) 

The  implementations  of  vararg  and  onearg  are  more  complicated  than  those  of  the  array  primi¬ 
tives.  In  addition  to  dispatching  on  the  top  level  form  of  the  type  of  the  argument,  they  must  also 
distinguish  between  record  types  with  different  numbers  of  fields.  In  the  case  that  the  argument 
type  is  a  record  type,  both  vararg  and  onearg  must  consider  the  number  of  fields  of  the  record  to 
determine  whether  or  not  to  emit  a  wrapper  function,  and  what  the  argument  types  should  be.  For 
vararg,  this  wrapper  function  packages  its  arguments  into  a  record  and  passes  it  on  to  the  original 
function.  The  onearg  construct  simply  reverses  this. 

def 

onearg  = 

code[a:Tmii,p:T32]A(rQ:R(a),/:  Vararg(a)(p))():(interp(Q;)  ^  p). 
case  ra{ 

Record  y  ^  vcase  a  ( 

Record  a'  ^  case  y 
I  injQ  X  vcase  o' 

([]  ^  A(y:unit).(/()) 

I  _  =>  deadx) 

I  inj  ^  X  vcase  a' 

([/3]^A((y):x[/3]).(/(y)) 

I  _  =>  deadx) 

I  inj  2  vcase  a' 

{Wi,P2]  =>  A((yi,y2):  x  [A, /l2])-(/(yi, ^2)) 

I  _  =>  deadx) 

I  injg.^  / 

I  _  =>  deady) 

1-^/) 

I  assign  a  LIL  heap  label  to  each  of  the  named  definitions:  for  example,  vararg  is  the  label  given 
to  the  vararg  definition. 
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5.5.2  The  term  level  translation 

The  complete  MIL  to  LIL  term  translation  is  given  in  section  5.6,  but  a  selection  of  example  rules 
from  the  translation  are  discussed  below  to  illustrate  the  important  points  of  the  translation. 

Small  values 

The  small  value  translation  judgement  is  of  the  form  A;  T  h  s?; :  r  sv',  and  is  for  the  most  part 
uncomplicated. 

Injections  into  non-value  carrying  arms  of  sum  types  are  translated  into  uses  of  union  types, 
as  described  previously.  Note  that  injections  into  value-carrying  arms  are  not  values  in  the  MIL 
since  they  require  allocation.  Their  translation  is  therefore  described  in  the  operation  translation. 


A;r  h  :  Sum^(^  inj_union|  tag^- 


Float  values 

The  float  value  translation  judgement  is  of  the  form  A;r  h  fv  :  Float  ^  fv' .  Float  values  in  the 
MIL  and  the  LIL  correspond  exactly:  hence  the  translation  leaves  the  terms  unchanged. 


64  bit  operations 

The  64  bit  operation  translation  judgement  is  of  the  form  Ah  i:  Float  i',  and  is  also  relatively 
uncomplicated.  For  example,  the  floating  point  subscript  operation  changes  in  only  minor  syntactic 
ways. 

A;  T  h  sill :  Farray  sv'^  A;  T  h  SV2  ■  Int 

A;  T  h  f  sub(s?;i,  SV2)  ■  Float  subpioatCsi’i,  SV2) 


32  bit  operations 

The  32  bit  operation  translation  judgement  is  of  the  form  A;  T  h  i :  r  ib  Much  of  the  interesting 
work  of  the  translation  takes  place  in  the  operations. 

Although  the  most  significant  part  of  the  MIL  to  LIL  translation  is  the  representation  of  type 
analysis  within  the  term  language,  the  definitions  given  in  section  5.5.1  isolate  all  of  the  uses  of 
type  analysis.  Consequently,  the  translation  of  type  analyzing  primitives  such  as  vararg  is  very 
straightforward:  they  simply  become  calls  to  the  code  definitions  via  the  appropriate  heap  labels. 

A;  T  h  SI! :  Cl  ^  C2  sv' 


A;  T  h  vararg^^^^2  ^  Vararg^^^^^  call  vararg[|ci|,mterp  |c2|](Jcil,,  sv') 

Notice  that  as  previously  discussed,  I  pass  the  interpretation  of  the  encoded  return  type  to  the 
vararg  code,  instead  of  the  encoded  type  itself.  The  dynamic  encoding  of  the  argument  type  is 
passed  as  an  additional  argument  to  the  function  to  allow  the  dispatch  on  types  to  take  place. 
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Sum  types  become  union  types,  with  the  tagging  of  records  made  explicit. 

A;  r  h  St! :  Cj  ^  sv' 

h  sv:  Suin^(^  inj_union|  s^^(^|(tag^.,  s?;') 

As  discussed  in  the  constructor  translation,  the  known  sum  type  from  the  MIL  disappears  in 
the  LIL.  Uses  of  the  projection  from  this  type  become  second  projections  out  of  a  tag- value  pair. 

A;  r  h  SI! :  Sum^  (c)  sv' 

A;r  h  proj^^^  sv  :  Cj  ^  select^  sv' 

In  the  LIL  the  exception  type  is  no  longer  primitive,  instead  being  replaced  by  uses  of  existen- 
tials  and  pairs. 

A;  r  h  sill :  Dyntag^  sv'^  A;  L  h  SV2  '■  c  ^  sv'2 

A;rh  inj_dyn^(  svi,  SV2) :  Exn 

pack(s?;'^,  sv'2)  3a:T32.(Dyntag(a)  x  a)  hiding  |c| 

Since  the  exception  type  is  closed,  I  will  frequently  refer  to  it  via  the  following  definition  for  brevity: 

Dyn  3a:T32.(Dyntag(a)  x  a) 

One  of  the  most  illustrative  rules  is  the  rule  for  the  translation  of  applications,  since  it  makes 
clear  the  “plumbing”  work  necessary  to  make  type  analysis  explicit. 

A;r  h  s?; :  V[ai::«:i, . . . ,  •  •  • ,  Tm)(A:)  ^  r  ^  sv' 

A;  r  h  sVi :  Ti[ci/ai, . . . ,  Cn/un]  sv'^  A;  L  h  fv^ :  Float  /y' 

A;r  h  s?;[ci, . .  .,Cn]{svi, . .  .,svm){fvi, . . .  ,/yfc)  :r[ci/ai, . . .  ,Cn/an]  ^ 

S?;'[|ci|,  .  .  .  ,  \Cn\]{\Cl[,  \Cni,  Sv'^,  .  .  .  ,  Sv'^){fv'^,  .  .  .  ,fv'^) 

Notice  that  each  type  argument  is  passed  to  the  function  once  as  a  type  argument  in  its  static 
encoding,  and  once  as  a  term  argument  in  its  dynamic  encoding.  The  rest  of  the  arguments 
are  translated  using  the  small  value  and  float  value  translations  and  are  passed  as  usual.  The 
application  translation  will  of  course  be  paralleled  exactly  by  the  translation  of  functions  in  the 
expression  translation  below. 

The  term  translation  judgements 

The  term  level  translation  judgement  is  of  the  form  A; T  h  e:T  ^  e' .  For  the  most  part,  the 
expression  translation  simply  appeals  to  the  various  other  judgements.  The  only  interesting  case  is 
that  of  polymorphic  functions. 

A;  r[/  :  V[a:'!fi:](r)(|xj|)  ^  r^,  av.n,  xTt,  xy]  h  ey  :  r,.  e'y 
A;r[/  :  V[a:':K](r)(|xy|)  h  e:r e' 

A;  r  h  let,-  rec^-^  /[a:TK](xrr)(xy).ey  ine  :  V[a:'!K](r)(|xy|)  t  ^ 
let|.,-|  recp^l  /[q;::|k|](xq,:  xiT)(xy:Float).ey  in e' 
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As  expected  from  the  application  case,  each  type  argument  is  both  lifted  to  the  static  encoding 
kind  and  used  to  construct  a  dynamic  encoding  type  for  a  new  term  level  argument  serving  as  its 
dynamic  encoding. 


Programs 

The  final  LIL  program  is  produced  from  a  closed  MIL  expression  by  wrapping  the  result  of  the 
expression  translation  with  a  heap  binding  the  defined  forms  used  in  the  translation. 


j  def 

di  =  •, 

sub 

upd 

array 

vararg 

onearg 


V[a:Tmii]  Code  [i?(Q;),  Array  (a),  Int][](interp(a))  sub, 

V[a:Tmii]  Code  [i?(a).  Array  (a),  Int,  interp(a)]  [](unit)  upd, 

V[a:Tmii]  Code[i?(a),  Int,  interp(a)][](Array(Q;))  array, 

V[a:Tmii,p:T32]  Code[i?(a),  (interp(a)  ^  /))][] (Vararg(a)(/9))  vararg, 
V[a:Tmii,p:T32]  Code[i?(a),  Vararg(Q;)(/9)][](interp(a)  ^  p)  onearg 


•  ;  •  h  e  :  r  e' 

h  e  :  r  letrec  di  in  e'  prog 

5.5.3  Proof  that  the  term  level  translation  preserves  typing 
Small  Values 

Theorem  5  (Small  value  translation) 

If  A;  r  h  s?; :  r,  h  'k  ok  and  A;T  sv  :t  sv'  then  'k;  |  A|;  JAl.,  |r|  h  sv' :  |r 
Proof:  By  induction  on  terms. 

1.  If  St!  =  X  then 
By  assumption: 

A:r, x:t  x:t 
h  'k  ok 

By  inverting  the  assumption: 

A  h  r,  x:t  ok 

By  lemma  19: 

|A|  hJAl,,  |r,x:r|  ok 

By  the  variable  rule  and  the  definition  of  |r|: 

'k;  |A|;  JAi,  |r|,x:|r|  h  x:  |t| 

2.  If  St!  =  i  then 
By  assumption: 

A:r  h i : Int 
h  'k  ok 

By  inverting  the  assumption: 

A  h  r  ok 
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By  lemma  19: 

|A|  hJAt,|r|  ok 

By  the  int  rule: 

'k;  |A|;  JAt,  |r|  h  i :  Int 

3.  If  sv  =  unrollc  sv  then 

By  assumption: 

A;rh  sv  :  TTj  ^(a,/3)(ci,C2) 

A  h  c  =  7Tifi{a,P){ci,C2):T32 
A\-  sv  :  TTi  (5)  (ci  ,02)^  sv' 

By  induction: 

|A|;  JAt,  |r|  h  s?;' :  |  vr*  ^(a, /3)(ci,  C2)| 

By  lemma  10: 

|A|  h  |c|  =  1 7ri/i(a,/3)(ci,C2)|  :  T^ii 
By  definition: 

I  TTi  n{a,  /?)(ci,  C2)|  =  Rec[l  +  l](/)(injj  *) 
where/  =  A(/>:1  +  1  ^  T32).A(a;.l  +  1). 
case  a; 

inj^  _  ^  mterp(|ci|[Other(/>(inji  >i=))/a,  Other(p(inj2  *))//?]) 

I  inj2  -  interpd  C2|[Other(p(inji  *))/«,  Other(p(inj2  *))//?]) 

Therefore,  by  the  unroll  rule: 

|A|;JAl,,  |r|  h  unroll|c|  sv' :  f{Rec[l  +  l](/))(inji  *) 

It  suffices  to  show  that: 

/(Rec[l  +  l](/))(injj>i=)  =  mterp(|ci[7ri  ^(a, /3)(ci,  C2)/q;,  7r2  ^(a, /3)(ci,  C2)//3]|) 

By  lemma  9: 

|ci[7ri  n{a,(5){ci,C2)/a,TT2n{a,P){ci,C2)/P]\  = 

|ci|[|  TTi  /3)(ci,  C2)|/a,  |  tt2  /u(a,  /3)(ci,  C2)|//3] 

By  definition: 

I  TTi  fi{a,/3){ci,C2)\  =  Other(Rec[l  +  l](/)(inj^  *)) 

1 7r2/i(a,/3)(ci,C2)|  =  Other(Rec[l  +  l](/)(inj2  *)) 

So  it  suffices  to  show  that: 

/(Rec[l  +  l](/))(inji*)  = 

interp(|ci|[Other(Rec[l  +  l](/)(inji  >i=))/a,  Other(Rec[l  +  l](/)(inj2  *))//?]) 

By  definition  of  /: 

/(Rec[l  +  l](/))(inji*)  = 
case  (inj j  *) 

interp(|ci|[Other(Rec[l  +  l](/)(inj^  >i=))/q;,  Other(Rec[l  +  l](/)(inj2  >i=))//3]) 
I  inj2  - 

interp(|c2|[Other(Rec[l  +  l](/)(inj^  *))/q;,  Other(Rec[l  +  l](/)(inj2  *))//?]) 
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If  i  =  1,  then  by  the  beta  rule: 

/(Rec[l  +  l](/))(inji*)  = 

interp(|ci|[Other(Rec[l  +  l](/)(inj^  >i=))/a,  Other(Rec[l  +  l](/)(inj2  *))//?]) 

If  i  =  2,  then  by  the  beta  rule: 

/(Rec[l  +  l](/))(inj2*)  = 

interp(|c2|[Other(Rec[l  +  l](/)(inj^  >f=))/a,  Other(Rec[l  +  l](/)(inj2  *))//?]) 

4.  The  roll  case  follows  similarly. 

5.  If  sv  =  then: 

By  inversion  of  the  first  assumption: 

A  h  r  ok 

By  lemma  19: 

|A|  hJAt,|r|  ok 

Therefore  by  the  tag  rule: 

|A|;  JAt,  |r|  b  tag^-  :  Tag(j) 

Note  that 

I  Sumj(ci,  ...,Cn)\  = 

V[Tag(0), . . .  ,Tag(i  -  1), ,  x[Tag(i),interp  |ci|], . . . ,  x [Tag(n), interp  |cn|]] 

So  by  the  union  formation  rule  : 

|A|;JAt,  |r|  b  inj_union|s^^(^|tagj  :|Sumj(^| 


Float  values 

Theorem  6  (Float  value  translation) 

If  A;T  \-  fv  :  Float,  b  T  ok  and  A; T  b  /r  :  Float  fv'  then  |A|;  JA[.,  |r|  b  fv' :  Float 
Proof: 

1 .  If  fv  =  Xf  then 

By  assumption: 

A:r, X/ b  X : Float 
b  T  ok 

By  inverting  the  assumption: 

A  b  T,  X/  ok 

By  lemma  19: 

|A|bJAt,|r,x/|  ok 

By  the  variable  rule  and  the  definition  of  |r|: 

'k;  |A|;  JAt,  |r|,x/  b  X/  :  Float 

2.  fv  =  r. 

By  assumption: 

A:r  b  r : Float 
b  'k  ok 
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By  inverting  the  assumption: 
A  h  r  ok 

By  lemma  19: 

|A|  hJAt,|r|  ok 

By  the  float  rule: 

'k;  |A|;JA1.,  |r|  h  r:  Float 


64  bit  operations 

Theorem  7  (Float  operation  translation) 

If  A;  r  h  i :  Float,  h  'k  ok,  and  A;  F  h  i :  Float  ^  i'  then  |A|;  JA[.,  |r|  h  i' :  Float  oprg^ 
Proof: 

1.  It  i  =  fv  then: 

By  inversion  of  assumptions: 

A]T  fv  :  Float 
A]T  fv:  Float  fv' 

By  theorem  6: 

'k;  |A|;  JAl.,  |r|  h  fv' :  Float 

By  the  float  value  inclusion  rule: 

'k;  |A|;  JAt,  |r|  h  fv' :  Float  oprg4 

2.  If  i  =  unboxf  sv  then: 

By  inversion  of  assumptions: 

A;  r  h  SI! :  Boxf 
A;  r  h  Sri :  Boxf  sv' 

By  theorem  5: 

'k;  |A|;  JAl.,  |r|  h  sv' :  Boxed  Float 
By  the  unbox  rule: 

'k;  |A|;  JAl.,  |r|  b  unbox  s?;' :  Float  oprg^ 

3.  If  i  =  f  sub(s?;i,  s?;2)  then: 

By  inversion  of  assumptions: 

A;  r  h  sill :  Farray 

A;  r  h  SV2  '■  Int 

A;  r  h  sill :  Farray  sv'-^ 

A;  r  h  SV2  :  Int  ^  sv'2 

By  theorem  5: 

vkslAhJA  t,  |r|  h  sv'i  :  ArrayQ4Float 
'k;  |A|;  JAl.,  IFI  h  sv'2  ■ 

Hence  by  the  64  bit  subscript  rule  : 

'k;  lAl;  JAt,  irl  h  subFioat(s?^i,  opr64 
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Operations  and  expressions 

I  begin  with  a  lemma  to  capture  the  appropriate  typing  conditions  for  the  definitions  used  by  the 
translation  of  the  type  analyzing  primitives. 

Lemma  21  (Definitions) 

1.  ff  h  'h  ok  then 

'k  h  sub  :V[a:Tmii]  Code[i?(a),  Array(Q;),  Int][](interp(a))  hval 

2.  If  h  'k  ok  then 

'k  h  upd  :V[a:Tmii]  Code[l?(a),  Array(Q;),  Int,mterp(a)][](unit)  hval 

3.  If  h  'k  ok  then 

'k  h  array  :V[a:Tmii]  Code[l?(Q;),  Int, interp(a)] [](Array(a))  hval 

4.  If  h  'k  ok  then 

'k  h  vararg  :  V[a:Tjnii, /3:T32]  Code [l?(a), mterp(a)  ^  /3)][](Vararg(Q;)(/I))  hval 

5.  If  h  'k  ok  then 

'k  h  onearg  :  V[a:Tmii, /3:T32]  Code[ll(a),  Vararg(a)(/3)][](interp(a)  (3)  hval 
Proof:  By  construction. 

■ 

I  define  a  heap  context  'k*  that  gives  types  for  the  above  definitions  as  follows: 

V[a:Tjnii]  Code[l?(a),  Array(a),  Int] [](interp(a)), 

V[Q;:Tjnii]  Code[l?(a),  Array(Q;),  Int,  mterp(Q;)][] (unit), 

V[a:Tmii]  Code[l?(a),  Int,mterp(a)][](Array(a)), 

V[a:Tmii,p:T32]  Code[ll(a),  (interp(a)  ^  p)] [](Vararg(Q;)(p)), 
V[a:Tmii,p:T32j  Code[ll(Q;),  Vararg(a)(p)][](mterp(a)  ^  p) 

Lemma  22  (Initial  context) 
h  'kj  ok 

Proof:  By  construction. 

■ 

Operations  and  expressions  are  mutually  recursive:  hence  their  soundness  theorems  are  stated 
most  naturally  together.  Since  the  translation  of  operations  generates  calls  to  the  code  functions 
implementing  type  analysis,  the  soundness  theorem  for  the  translation  is  defined  relative  to  the 
initial  context  defined  above. 

Theorem  8  (Operation  and  expression  translation) 

1.  If  A;T  \-  i  :t  and  A;  L  h  i :  r  ^  then  \ki;  |  A|;  J  Al.,  |r|  h  i' :  |r|  oprgj 
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2.  If  A;  r  h  e  :  r  and  A;  F  h  e  :  r  then  'F*;  |A|;  JAl.,  |r|  h  e!  :  |r|  exp 

Proof: 

The  proof  proceeds  by  simultaneous  induction  on  operations  and  expressions.  We  begin  with 
the  operations. 

1.  If  i  =  SI!  then: 

By  inversion  of  assumptions: 

A;  r  h  St! :  r 
A;  r  h  sv.T  ^  sv' 

By  theorem  5: 

'hi;  |A|;  JAt,  |r|  h  sv' :  |r| 

By  the  small  value  expression  inclusion  rule: 

|A|;JAl,,  |r|  h  sv':\t\  oprg^ 

2.  If  i  =  vararg^^^^2 

By  inversion  of  assumptions: 

A  h  Cl :  T32 
A  h  C2  :  T32 
A;  r  h  si; :  Cl  ^  C2 
A;  r  h  si; :  Cl  ^  C2  sv' 

By  theorem  2: 

|A|  h  |ci|  :  Tinii 

|A|  h  |c2|  :  Tjnii 

By  lemma  8: 

I  A|  h  interp  |c2|  :  T32 

By  theorem  3  and  weakening: 

^i;|A|;JAt,|r|  hJciL  ■  R{ci)  (Note  Jci:T32l=  li(ci)). 

By  theorem  5: 

'hi;  |A|;  JAt,  |r|  h  sv' :  |ci  ^  C2I 
By  the  initial  context  lemma  (22): 

d'i;  |A|;JA1,,  |r|  h  onearg  :  V[a:Tmii, /o:T32]  Code[l?(a), interp(a)  ^ /3] [](Vararg(a)(p)) 

By  the  code  call  rule: 

'hi;  |A|;JA1,,  |r|  h  call  vararg[|ci|,  interp  |c2|](Jcit,  s?;') :  Vararg (|ci|) (interp  |c2|)  oprgj 

3.  If  i  =  onearg^j^^2 

By  inversion  of  assumptions: 

A  h  Cl :  T32 
A  h  C2  :  T32 

A;  r  h  si; :  Vararg^^^^^ 

A;  r  h  si; :  Vararg^^^^^  sv' 

By  theorem  2: 

|A|  h  |ci|  :  Tmii 

|A|  h  |c2|  :  Tmii 
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By  lemma  8: 

I  A|  h  interp  |c2|  :  T32 

By  theorem  3  and  weakening: 

^,;|A|;JAt,|r|  hjcil  :  i?(|ci|)  (Note  Jci:T32l=  i?(|ci|)). 

By  theorem  5: 

'kj;  |A|;  JAt,  |r|  h  sv' :  Vararg(|ci|) (interp  |c2|) 

(Note  IVararg^^^^^l  =  Vararg(|ci|) (interp  |c2|)) 

By  the  initial  context  lemma  (22): 

4'*;  |A|;JA1,,  |r|  h  vararg  :  V[a:Tmii,  p:T32]  Code [i?(Q;),  Vararg(a)(p)] [](mterp(a)  ^ /3) 
By  the  code  call  rule: 

'i^*;|A|;jAUr|  h  call onearg[|ci|, interp  |c2|](Jcil.,  s?;') :  interp  |ci|  ^  interp  |c2|  oprgj 

4.  If  i  =  boxf  fv  then 

By  inversion  of  assumptions: 

A]T  fv  :  Float 
A]T  \-  fv:  Float  fv' 

By  theorem  6: 

'kj;  |A|;  JAt,  |r|  h  fv' :  Float 

By  the  box  rule: 

'kj;  |A|;  JAl.,  |r|  h  hoxfv' :  Boxed  Float 

5.  If  i  sv  then 

By  inversion  of  assumptions: 

A  h  Sum^  (^  :  T32 

A; r  h  s?; :  Sum)'(ci, . . .  ,Cj, . . .  ,Cn) 

A;r  h  s?; :  Sum^(cj, . . .  ,Cj, . . .  ,Cn) sv' 

By  theorem  5: 

4'i;  |A|;  JAt,  |r|  h  s?;' :  |  Sum|(ci, . . . ,  cj, . . . ,  Cn)\ 

By  definition: 

I  Sum^(ci, . . .  ,Cj, . . .  ,Cn)|  =  X [Tag(j), interp  |cj|] 

Hence  by  the  selection  rule: 

'kj;  |A|;  JAl.,  |r|  h  select^  sv' :  interp  |cj| 

(Note  the  implicit  type  inclusion,  hence  interp  |cj|  =  |T(cj)|) 

6.  If  i  =  ini'!  ,  ^  sv  then 

By  inverting  the  assumptions: 

A  h  Sumj(ci, . . .  ,Cj,. . .  ,Cn):  T32 

A;  r  h  SI! :  Cj 

A;  r  h  SI! :  Cj  sv' 

By  theorem  5: 

|A|;JA1,,  |r|  h  sv' :  \cj\ 
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By  the  pair  and  tag  rules: 

|A|;  JAt,  |r|  h  (tag^.,s?;')  :  x  [Tag(j), \cj\] 

By  definition: 

I  Sumj(ci,  ...,Cn)\=  V[Tag(0), . . .  ,Tag(f  -  1), ,  x[Tag(f),  |ci|],. . . ,  x[Tag(n),  |cn|]] 

By  the  union  injection  rule: 

|A|;JAt,  |r|  H  inj_union|g^^(^|(tag^.,s?;') :  |  Suini(c)| 

7.  If  i  =  {sv[, . . . ,  then: 

By  inverting  assumptions: 

A;  r  h  srij :  Ti  i  G  1 . . .  n 
A;  r  h  svi :  Ti  sv'^  i  G  1 . .  .n 

By  theorem  5: 

|A|;  JAt,  |r|  h  s?;' :  |ri|  iGl...n 
By  the  tuple  rule: 

|A|;  JAt,  |r|  h  :  x  [|ri|, . . . ,  |rn|] 

8.  If  i  =  select*  sv  then: 

By  inverting  assumptions: 

A;  r  h  St! :  n  X  . . .  X  Tn 
A;  r  h  St! :  Ti  X  . . .  X  Tn  s?;' 

By  theorem  5: 

|A|;  JAt,  |r|  h  s?;' :  In  x  . . .  x  nl 

By  the  select  rule: 

'hj;  |A|;  JAt,  |r|  h  select*  sv' :  |n| 

9.  If  i  =  caseT-(s?;)  (xi.ei, . . . ,  Xn-en)  then: 

By  inverting  assumptions: 

A  h  t:T32 

A;  r  h  St! :  Sumj(c) 

A;r[xj  :  Sum^(c)]  h  e^- :  r 
A;  r  h  St! :  Sumj(c)  ^  sv' 

A;  r[xj  :  Sunil  (^]  h  :  r  ^  e'- 

By  theorem  4: 

|A|  h  |t|:T32 

By  theorem  5: 

4'i;  |A|;JAl,,  |r|  h  s?;' :  |  Sumi(c)| 

By  induction: 

|A|;JAl,,  |r[xj:Sum^(c)]|  he'- :|r|  exp 
Note  that: 

|r[xj:Sum^(c)]|  =  |r|[xj:|  Suin^(c)|] 

I  Sumj(ci, . . .  ,Cn)|  =  V[Tag(0), . . .  ,Tag(f  -  1), ... ,  x[Tag(f),  |ci|],. . . ,  x[Tag(n),  |cn|]] 
I  Suin^(ci, . . . ,  Cn)|  =  Tag(j)  j  <  i 
I  Sum^(ci, . . .  ,Cn)|  =  x[Tag(j),Cj]  i  <j  <n 
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Hence  by  the  case  introduction  rule: 

|A|;  JAt,  |r|  h  case|^|(s?;')  {xi.e[, . .  .,Xn.e'J  :  |t|  exp 

10.  If  i  =  handleT-(ei,  x.e2)  then: 

By  inverting  assumptions: 

A  h  t:T32 
A;  r  h  Cl :  r 
A;  r[x  :  Exn]  h  62  :  r 
A;  r  h  Cl :  r  e'^ 

A;  r[x  :  Exn]  h  62  :  r  ^  62 

By  theorem  4: 

|A|  h  |t|:T32 

By  induction: 

4'*;  |A|;  JAt,  jB]  h  e[  :  jrj  exp 
'hj;  |A|;  JA],,  |r[x:Exn]|  h  :  jrj  exp 

By  definition: 

|r[x:Exn]|  =  |r|[x:Dyn] 

So  by  the  handle  rule  : 

|A|;  JAt,  jrj  H  handle|^|(e3,x.e2) :  jrj  oprga 

11.  If  i  =  s?;[ci, . . .  ,Cn](s?;i, . . . ,  svm){fvi, . . .  ,/Cfc)  then: 

By  inverting  assumptions: 

A;r  h  s?; :  V[ai::Ki, . . . ,  a„::«:n]('ri,  ■  ■  ■  ,Tm){k)  ^  t 

A;r  h  svi\Ti[ci/ai, . . .  ,Cn/an] 

A;  r  h  /Cj :  Float 

A;r  h  s?; :  V[ai::Ki, . . .  ,an---Kn]{Ti, . . .  ,Tm){k)  ^  t  sv' 

A;r  h  svi\Ti[ci/ai, . . .  ,Cn/an]  sv^ 

A;  r  h  fvi :  Float  fv[ 

By  theorems  5  and  6: 

4'i;  |A|;  JAt,  jB]  h  sv' :  |V[ai::Ki, . . . ,  an::«^n](ri, . .  .,Tm){k)  r| 

4'i;  |A|;  JAt,  |B|  h  sv[ :  |ri[ci/ai, . .  .,Cn/an]\ 

|A|;  JAt,  |B|  h  /c'  :  Float 

By  definition: 

\y[a::Ki, . .  .,a::Kn\{Ti, . .  .,Tm){k)  ^  rj  = 

V[ai::|Ki|, . . . ,  Q;n::|Kn|](Jai:«il,  •  •  ■ , \an-Hni,  jnl,  •  •  • ,  |rm|)(Floato  . ..  Floaty)  ^  | 

By  theorem  2: 

I A|  h  |cj|  :  ]«:]  i  G  1 ..  .n 

Therefore  by  the  type  instantiation  rule  : 

4/,;|A|;JAUr|hs^'[|ci|,...,|c„|]: 

((Jap/tit, . .  .,\an-.Kni,  jrij  )  •  •  •  )  jtm  |)(Floato  . .  .Floatfc)  ^  |r|)[|ci|/ai, . . . ,  |cn|/a; 
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To  apply  the  function  application  rule,  is  suffices  to  show  that: 

|A|;  JAt,  |r|  'r\ci[  :  \ai\Ki[  [|ci|/ai, . . . ,  |cn|/an]  i  G  1  •  •  .n 

and 

|A|;  JAt,  |r|  h  s?;'  :  |ri|[|ci|/ai, . . . ,  |c„|/a„]  i  G  1 . .  .m 
By  theorem  3: 

|A|;  JAt,  |r|  hjcit  :  i  G  1 . .  .n 

By  lemma  13: 

\ai\Ki[  [|ci|/ai, . . . , \cn\/an]  =\ai[ci/ai, . . . ,  Cn/ an]-.tii[=\ci\Ki[ 

We  know  from  above  that: 

|A|;  JAt,  |r|  h  s?;'  :  |ri[ci/ai, . . .  ,Cn/an]|  i  G  1 . .  .m 

But  by  lemma  9: 

|ri|[|ci|/ai, . . . , \cn\/an]  =  |ri[ci/ai, . . .  ,Cn/an]| 

Therefore,  by  the  application  rule  : 

|A|;  JAt,  |r|  h  s?;'[|ci|,. . . ,  |cn|](Jcit,  • .  .,\cn[,sv'^, . . . ,  sv'^){fv']^, . . .  Jv'f,) : 

|r|[|ci|/ai, . . . ,  |cn|/a„] 

Finally,  note  that  by  lemma  9: 

|r|[|ci|/ai, . . . , \cn\/an]  =  |r[ci/ai, . . .  ,Cn/an]| 

12.  If  i  =  raisers?;  then: 

By  inverting  assumptions: 

A  h  t:T32 
A;  T  h  St! :  Exn 
A;  T  h  SI! :  Exn  sv' 

By  theorem  4: 

|A|  h  |t|:T32 

By  theorem  5: 

'hi;  |A|;  JAt,  |r|  h  sv' :  |Exn| 

Hence  by  the  raise  introduction  rule: 

4'i;  |A|;  JAt,  |r|  h  raisej^l  sv' :  |r| 

13.  If  i  =  mkexntag^  then: 

By  inverting  assumptions: 

A  h  c:  T32 

A  h  T  ok 

By  theorem  4: 

|A|  h  |c|:T32 

Hence  by  the  dyntag  intro  rule: 

4'*;  |A|;  JAt,  |r|  G  dyntag|^l  :  Dyntag(|c|) 

14.  If  i  =  exncaseT-(s?;)  (s?;i  ^  xi.ei,  _  ^  62)  then: 
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By  inverting  assumptions: 

A;  r  h  St! :  Exn 

A  h  t:T32 

A;  r  h  s?;i :  Dyntag^^ 

A;r[xi:ci]  h  d  :  r 

A;r  h  62  :r 

A;  r  h  St! :  Exn  sv' 

A;  r  h  sill :  Dyntag^^  sv'i 
A;  r[xi:ci]  h  ei :  r  e'^ 

A;  r  h  62  :  r  62 

By  theorem  4: 

|A|  h  |r|:T32 

By  theorem  5: 

4'*;  |A|;JA1,,  |r|  h  sv' :  |Exn| 

lAlUAt,  |r|  h  sv'^ :  iDyntag^^  | 

By  induction: 

4'*;  |A|;  JAt,  |r[xi:ci]|  h  63 :  |t|  exp 
lAI;  JAt,  |r|  h  62  :  |t|  exp 

Hence  by  the  dyncase  introduction  rule: 

4'i;  |A|;  JAt,  |r|  H  dyncase^(s?;)  {sv'^  63)  :  |r|  oprgj 

15.  If  i  =  subc(s?;i,  s?;2)  then: 

By  inverting  assumptions: 

A  h  c:  T32 

A;  r  h  svi :  Array^ 

A;  r  h  SV2  :  Int 

A;  r  h  sill :  Array^  ^  sv'^ 

A;  r  h  SV2  :  Int  ^  sv'2 

By  theorem  4: 

|A|  h  |c|:T32 

By  theorem  5: 

|A|;  JAt,  |r|  h  s?;;  :  |ArrayJ 
4'i;  |A|;  JAt,  |r|  h  sv'^  :  |lnt| 

By  theorem  3: 

4/,;|A|;JAUr|  hJcL  :  i?(|c|) 

By  lemma  22: 

4'*;  |A|;JA1,,  |r|  h  sub  :  V[a:Tmii]  Code[i?(a),  Array(Q;),  Int)] [](interp(a)) 
Hence  by  the  code  call  rule  : 

4'i;  |A|;JAl,,  jrj  h  call sub[|c|](Jct,  s?;3,  s?;2) :  interp|c|  opr32 

16.  If  i  =  upd^(s?;i,  SV2,  SV3)  then: 


67 


By  inverting  assumptions: 

A  h  c:  T32 

A; r  h  svi : Array^ 

A; r  h  SV2 ■ Int 

A;  r  h  SV3  :  c 

A;  r  h  svi :  Array^  sv[ 

A;  r  h  SV2  ■  Int  ^  SV2 
A;  r  h  SV3  :  c  sv'^ 

By  theorem  4: 

|A|  h  |c|:T32 

By  theorem  5: 

|A|;  JAt,  |r|  h  sv[  :  |ArrayJ 
4'*;  |A|;  JAt,  |r|  h  SV2  :  |lnt| 

4'*;  |A|;JAl,,  |r|  h  sv'^:  \c\ 

By  theorem  3: 

|A|;  JAt,  |r|  hjct  :  i?(|c|)  exp 
By  lemma  22: 

4'*;  |A|;  JAt,  |r|  h  upd  :  V[a:Tjnii]  Code  [i?(a),  Array  (a),  Int,  interp(Q;)][]  (unit) 

Hence  by  the  code  call  rule  : 

|A|;JA1,,  |r|  h  call  upd[|c|](Jct,  s?;2i  :  Unit  oprg^ 

17.  If  i  =  array^(s?;i,  s?;2)  then: 

By  inverting  assumptions: 

A  h  c:  T32 

A;  r  h  svi :  Int 

A;  r  h  SV2  :  c 

A;  r  h  srii :  Int  ^  SV2 

A;r  h  SV2  :  c  ^  SV2 

By  theorem  4: 

|A|  h  |c|:T32 

By  theorem  5: 

4'i;  |A|;  JAt,  |r|  h  :  |lnt| 

|A|;  JAt,  |r|  h  SV2  :  |c| 

By  theorem  3: 

4/,;|A|;JAUr|  hjcl  :  i?(|c|) 

By  lemma  22: 

4'i;  |A|;  JAt,  |r|  H  array  :  V[a:Tmii]  Code[i?(a),  Int,  interp(a)][](Array(a)) 

Hence  by  the  code  call  rule  : 

|A|;JA1,,  |r|  h  callarray[|c|](Jct,s?;i,s?;2)  :Array(interp|c|)  oprgj 

18.  If  i  =  iwpd{svi,  SV2,  fv)  then: 
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By  inverting  assumptions: 

A;  r  h  sill :  Farray 
A;  r  h  SV2  ■  Int 
A;rh/^;: 

A;  r  h  sill :  Farray  sv'i 
A;  r  h  SV2  ■  Int  ^  SV2 
A]T  \-  fv:  Float  fv' 

By  theorems  5  and  6: 

'hj;  |A|;  JAt,  |r|  h  :  |Farray| 

'I'j;  |A|;  JAt,  |r|  h  SV2  ■  Int 
lAI;  JAt,  jrj  h  /y' :  Float 

By  the  64  bit  array  update  rule: 

|A|;  JAt,  |r|  h  u-pd^^^^^{sv[,sv2,fv'):\]nit  opr32 

19.  If  i  =  a.zza.y(sv ,  fv)  then: 

By  inverting  assumptions: 

A;  r  h  Sri :  Int 

A;rh/^;: 

A;  r  h  Sri :  Int  sv' 

A]T  \-  fv:  Float  fv' 

By  theorems  5  and  6: 

'hj;  |A|;  JAt,  |r|  h  sv' :  Int 
'I'j;  lAI;  JAt,  jrj  h  fv' :  Float 

By  the  64  bit  array  creation  rule: 

|A|;JAl,,  |r|  b  :Farray(Float)  optgj 

For  expressions 

1.  If  e  =  s?;  then: 

By  inversion  of  assumptions: 

A;  r  h  SI! :  r 
A;  r  h  sv  :t  sv' 

By  theorem  5: 

4'*;  |A|;JA1,,  |r|  h  sv' :  |r| 

By  the  small  value  expression  inclusion  rule: 

'Fj;  |A|;  JAt,  |r|  h  sv' :  |r|  exp 

2.  If  e  =  let,-  X  =  i  in  e  then: 

By  inversion  of  assumptions: 

A;  r  h  i-.Ti 
A;  r,  x:Ti  h  e  :  r 
A;T  \-  i :  Ti  i' 

A;  r,  x:Ti  \-  e:T  e' 
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By  induction: 

|A|;JAl,,  |r|  h  i'  :\Ti\  oprg^ 

d'j;  |A|;JAl,,  |r|,x:|ri|  h  e' :  \t\  exp  (Note  that  |r,x:ri|  =  |r|,x:|Ti|) 

By  the  operation  rule: 

'hi;  |A|;  JAt,  |r|  h  letj^l  x  =  ine' :  |t|  exp 

3.  If  e  =  let,-  Xf  =  i  in  e  then: 

By  inversion  of  assumptions: 

A;  r  h  i:  Float 
A; r, X/ h  e : r 
A;  r  h  i:  Float  i' 

A;  r,  X/  h  e  :  r  e' 

By  induction  and  theorem  7: 

^'i;  |A|;JA1,,  |r|  h  i' :  Float  opXg^ 

'hi;  |A|;  JAl.,  |r|,  xj:Float  h  e' :  |r|  exp  (Note  that  |r,  xj|  =  |r|,  xj:Float) 

By  the  64  bit  operation  rule: 

'h,;|A|;JAUr|hlet|  T-|  X/  =  in  e' :  |r|  exp 

4.  if  e  =  letT-recT-^  /[«:'!«] (xrT)(x'/).e/  ine  then: 

By  inversion  of  assumptions: 

A,  ay.K]  r[/  :  V[a:'T«:](T)(|xj|)  — >  r^,  xTr,  x"}-]  h  ej  : 

A;r[/  :  y[ai:K]{T){\xf\)  ^  Tr]  h  e  :  r 
A  ,  ay.K]  T[f  :  \/[ay.K]{T){\x f\)  — >  r^,  xTr,  xTf]  \-  ef  :Tr  e'j 
A;r[/  :  V[a:':«:](f)(|x/|)  ^  r,.]  h  e  :  r e' 

By  induction: 

4'i;  |A,a:lK|;  JA,a:7Kl,,  |r[/  :  V[a:lK](f)(|x/|)  ^  Tr,xTT,Xf]  \  h  e'j  :  |t^|  exp 
4'i;  |A|;  JAt,  |r[/  :  V[a:7K](f)(|x/|)  Tr]\  b  e' :  |r|  exp 

By  definition: 

JA,q;:7k1,=JA1,,Xq:  \a-.K[ 

And 

By  definition: 

|r[/  :  V[a:7K](r)(|x/|)  ^r^,xrT,x7J|  = 

|r|[/  :  |V[a:7K](f)(|x/|)j^Tr|,x:|T|,x/:Float]  = 

|r|[/  :  V[a::|K|](Ja:Kt,  |r|)(Float)  ^  |r^|,  x:|r|,  x/:Float] 

Hence  by  the  rec  rule: 

'h,;|A|;JAUr|h 

let|T-|  recp^l  /[a::|K|](xQ,:  Jq;::k[,  xr7)(x/:Float).ej  ine' :  |r| 


Programs 

The  correctness  of  the  program  translation  follows  almost  immediately. 
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Theorem  9  (Programs) 

If  •  h  e  :  and  h  e  :  r  let  di  in  e'  prog  then  •  h  let  d*  in  e  :  |r 
Proof:  (By  construction) 

By  lemma  21  and  the  heap  well-formedness  rule: 

'hi  h  di  ok 
By  theorem  8 
'hi;  •  h  e' :  |t|  exp 

So  by  the  program  well-formedness  rule 
•  H  let  di  ine  :  |r| 


5.6  The  complete  term  level  translation 

Small  Values 


A;  r  h  SI! :  r  sv' 


A; r[x:r]  \-  x:t  x 


A;  r  h  i :  Int  i 


A\-  c  =  TTi  ^(a, /3)(ci,  C2)  :T32 
A  h  St! :  TTi  j3)  (ci  ,02)^  sv' 

A;r  h  unrollcS?;:ci[7ri//(a,/3)(ci,C2)/a, 7r2/r(a,/3)(ci,C2)//3] 

Unrollinterp|c|  Sv' 


A  I-  c  =  TTi  ^(a, /3)(ci,  C2)  :T32 
A;r  h  s?;:ci[7ri^(a,/3)(ci,C2)/a,7r2/u(a,/3)(ci,C2)//3]  sv' 

A;  r  h  relic  sv  :  vTi  ^(a,  /3)(ci,  C2) 

roll|c|  sv' 


A;r  h  inj_tag^g^^(^-^ :  Sum^(^  ^  inj_union|  tag^- 


Float  values 


A  h  /r  :  Float  fv' 


A;  r  H  r  :  Float  r 


A;  r[a;j]  b  xj  :  Float  Xf 
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64  bit  operations 


A\-  i :  Float  i' 

A;T  \-  fv  :  Float  ^  fv' 

A;T  \-  fv  :  Float  ^  fv' 

A;  r  h  SI! :  Boxf  sv' 

A;  r  h  unboxf  sv  :  Float  unbox  sv' 


Operations 


A;  r  h  sill :  Farray  sv'^  A;  F  h  SV2  ■  Int  sv'2 


A;  r  h  f  sub(s?;i,  SV2)  ■  Float  subpioatCs^’i,  sv'2) 


A;T  \-  i  :t  ^  i' 


A]T  sv  :  T  sv' 


A]T  sv  :  T  sv' 


A;  r  h  St! :  Cl  ^  C2  sv' 

A;r  h  vararg^^^^2  call  vararg[|ci|,interp  |c2|](Jcil,,  st') 

A;  r  h  si; :  Vararg^^^^^  sv' 

A;  r  h  onearg^^^^^  si; :  ci  ^  C2  ^  callonearg[|ci|,interp  |c2|](Jcil,,  sv') 

A;T  \-  fv  :  Float  ^  fv' 

A;  r  h  boxf  fv : Boxf  hoxfv' 

A;  r  h  St! :  Sum^  (c)  ^  sv' 

A;  r  H  sv  :  Cj  ^  select^  sv' 

A;  r  h  si; :  Cj  sv' 

h  sv  :  Sum^(^  inj .union,  s^^(^|(tag^.,  s?;') 
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A;  r  h  svi :  n  sv'^ 

A;r  h  {sVl,  .  .  .,SVn)  :TiX  ...  X  Tn'^  {sv[,  .  .  .,SVn) 

A;  r  h  SI! :  Ti  X  ...  X  Tn  sv' 

A;  r  h  select*  sv  :Ti  ^  select*  sv' 

A;  r  h  SI! :  Sumj(^  sv'  A;  T[xj  :  Sum^(^]  h  :  r  e'- 
A;  r  h  case^(s?;)  (xi.ei, . . . ,  Xn-en)  '■  t  ^  case|^|(s?;')  {xi.e'^, ...,  x^.e^) 

A;  r  h  Cl :  r  e'^  A;  r[x  :  Exn]  h  62  :  r  62 
A;  r  h  handleT-(ei,  x.e2)  :  r  handle|T-|(e'^,  x.e2) 

A-,T  sv  :  V[ai::/ti, . . . ,  an^^nKn,  ■  ■  ■  ,Tm){k)  ^  t  sv' 

A;  r  h  sVi :  ri[ci/ai, . . . ,  Cn/an]  st '  A;  T  h  fvi :  Float  /y' 

A;r  h  sx[ci, . . .  ,Cn](sxi, . .  .,svm){fvi, . . .  ,/tfc)  :r[ci/ai, . . .  ,Cn/an]  ^ 
sx'[|ci|,  .  .  .  ,  ICnlKJcit,  .  .  .  , \Cn[,  Sv'^,  ...,  Sv'^){fv'^,  .  .  .  ,fv'^) 

A;r  h  sv  :  Exn  sv' 

A;  r  h  raise,-  sv  :t  raise|,-|  sv' 

A;  r  h  SVl  ■  Dyntagj,  sv'^  A;  F  h  sv2  ■  c  ^  sv'2 

A;rh  inj_dyn^(  SVl,  SV2)  ■  Exn 

pack(s?;'^,  sv'2)  3a:T32.(Dyntag(Q;)  x  a)  hiding  |c| 

A;  r  h  mkexntag^  :  Dyntag^  ^  dyntagj^l 

A;  r  h  sxi :  Dyntag^^  ^  sv'^ 

A;  r[xi:ci]  h  ei :  r e'^  A;  F  h  62  :  r 62 

A;  F  h  exncase,-(s?;)  (sri  ^  xi.ei,  _  ^  62)  :  r  dyncase^(sx)  {sv'i  ^  xi.e'j^,  _  ^  6*2) 

A;  F  h  sri :  Array^  sv'i  A;  F  h  SV2  ■  Int  sv'2 
A;  F  h  subc(sxi,  SV2)  ■  c  call  sub[|c|](Jcl.,  sv'^,  sv'2) 


73 


Expressions 


A;  r  h  sill :  Array^  ^  A;  F  h  SV2  ■  Int  s?;^ 

A;  r  h  SV3  :  c  sv'^ 

A;  r  h  upd^(s?;i,  SV2,  SV3) :  Unit  call  upd[|c|](Jcl.,  sv'i,  SV2,  sv'^) 

A;  r  h  sill :  Farray  sv^  A;  F  h  SV2  ■  Int  SV2 
A;T  \-  fv:  Float  fv' 

A;F  h  fupd(s?;i,s7;2,/y)  :Unit  SV2JV') 

A;  F  h  sill  :  Int  sv'i  A;  F  h  SV2  ■  c  sv'2 
A;  F  h  array^(s?;i,  SV2)  '■  Array^  ^  call  array[|c|](Jcl.,  sv'i,  sv'2) 

A;  F  h  SI! :  Int  sv'  A]T  h  fv  :  Float  fv' 

A;  F  h  f  array(s?;,/?;)  :  Farray  arraypj^o^^(s?;',/?;') 


A;  F  h  e  :  T  e' 


A;  F  h  sr  :  T  ^  St' 

A;F  h  sv  :t  sv' 

A;T  \-  i  :r'  ^  i'  A;  F[x  :  r']  F  e  :  r  e' 

A;  F  h  let,-  x  =  i  in  e  :  r  letlrl  x  =  i'  ine' 

A;  F  h  i :  Float F  A;  F[xj]  h  e  :  r e' 

A;  F  h  letr  Xf  =  i  in  e  :  r  letlr]  Xf  =  i'  ine' 

A;  F[/  :  V[a:lK](r)(|xj|)  ^  r^,  av.n,  xTt,  x~f]  h  ej  :Tr  e'j 
A;F[/  :  V[a:':K](r)(|x/|)  ^  r^]  h  e :  r e' 

A;  F  h  leti-  rec^-^  f[ay.K\{xTT){arf).ef  ine  :  r 

let|T-|  recj^^l  f[a::\K\]{xa:  Ja::K[,  xr?)(x/:Float).ej  ine' 
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Chapter  6 


Closure  converted  LIL 


6.1  Introduction 

Closure  conversion  is  a  common  strategy  for  implementing  languages  with  higher  order  functions. 
The  essential  idea  is  to  replace  terms  that  evaluate  via  substitution  of  values  for  free  variables  with 
terms  that  can  be  evaluated  by  referring  to  an  environment  providing  definitions  for  free  variables. 
Since  the  latter  evaluation  model  matches  more  closely  that  of  actual  machines,  closure  conversion 
is  probably  the  most  standard  technique  used  for  compiling  higher-order  functions. 

In  the  LIL,  closure  conversion  is  the  process  by  which  function  definitions  are  replaced  with 
existentially  abstracted  pairs  of  environments  and  code  pointers;  and  by  which  function  applica¬ 
tions  are  replaced  with  projections  and  code  calls.  Since  the  LIL  is  a  typed  language,  the  closure 
conversion  translation  must  maintain  well-typedness  of  terms.  This  means,  among  other  things, 
that  in  addition  to  turning  free  term  variables  into  additional  function  arguments,  the  closure  con¬ 
version  translation  must  also  turn  free  constructor  variables  into  additional  function  constructor 
arguments.  However,  since  the  LIL  language  admits  a  type  erasure  interpretation,  the  type  envi¬ 
ronments  do  not  need  to  be  represented  at  runtime,  which  simplifies  the  problem  of  typed  closure 
conversion. 

The  fact  that  LIL  types  can  be  erased  at  runtime  means  that  in  the  underlying  assembly 
language  semantics,  polymorphic  instantiation  can  be  a  value,  whereas  in  the  MIL  a  polymorphic 
instantiation  corresponds  to  passing  actual  runtime  data  and  hence  can  not  be  thought  of  as  a  value 
in  the  sense  of  requiring  no  computation  (as  opposed  to  being  valuable  in  the  sense  of  being  freely 
duplicable).  This  fact  is  important  since  it  allows  closure  conversion  to  instantiate  polymorphic 
functions  directly  when  closures  are  built,  which  simplifies  the  theoretical  framework  for  typed 
closure  conversion. 

There  is  extensive  literature  on  typed  closure  conversion  [CWM98,  MWCG97,  MMH96,  MH98], 
and  the  LIL  closure  conversion  algorithm  adds  nothing  essential  to  this  previous  work.  In  this 
chapter  I  will  briefly  describe  the  important  translation  steps  and  prove  a  soundness  theorem 
without  going  into  any  great  detail. 


6.2  The  closure  conversion  translation 

There  are  a  number  of  different  strategies  possible  for  implementing  closure  conversion  for  recursive 
functions.  Morrisett  and  Harper  [MH98]  give  a  fairly  comprehensive  overview  of  the  different 
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approaches,  and  in  section  9.3  I  discuss  some  of  the  trade-offs  between  the  different  approaches. 
The  translation  as  described  here  uses  the  “recursive  code”  translation,  which  implements  closures 
via  simple  existentials  and  pairs,  but  requires  a  primitive  notion  of  self-calling  code.  Fortuitously 
(by  design),  the  LIL  language  supports  all  of  these  notions. 

In  the  next  sections,  I  give  a  high-level  overview  of  each  part  of  the  translation,  and  list  a  few 
key  translation  rules.  In  general,  the  closure  conversion  translation  rewrites  terms  by  first  rewriting 
their  sub-terms,  and  then  reconstructing  the  term  intact.  Only  in  the  specific  cases  of  functions, 
function  types,  and  function  applications  is  a  change  made  to  the  immediate  structure  of  a  term 
or  type. 

6.2.1  Constructors 

Closure  conversion  does  not  change  kinds  in  any  way,  so  the  first  translation  I  discuss  is  the 
translation  of  constructors.  I  notate  the  translation  of  a  LIL  constructor  c  under  closure-conversion 
as  |c|.  Almost  every  case  of  the  constructor  translation  is  uninteresting  except  for  the  case  for  a 
polymorphic  function  type. 

|V[ai:Ki, . .  .,an-nn].^  (ci)(c2)(c3)| 

3[ae:T32].  x  [V[ai:Ki, . . . ,  Code(ae  ::  |ci|)(|c2|)(|c3|),  Q!e] 

This  case  is  actually  quite  illuminating  since  it  clearly  demonstrates  the  closure  conversion  process, 
and  is  therefore  worth  exploring  more  closely. 

Recall  the  type  of  the  arrow  constructor  in  the  LIL:  ^:T32list  ^  Te4list  ^  T32  ^  T32.  The 
types  Cl,  C2,  and  C3  are  therefore  the  the  32  bit  argument  type  list,  the  64  bit  argument  type  list, 
and  the  return  type,  respectively.  The  translation  of  the  arrow  type  simply  translates  these  types 
compositionally,  and  then  adds  an  additional  type  to  the  head  of  the  32  bit  argument  type  list. 
This  type,  cce,  is  the  type  of  the  environment  containing  the  free  variables  of  the  function  that  are 
to  be  passed  to  it  as  an  additional  argument.  I  use  the  informal  ::  notation  to  mean  addition  to 
the  head  of  a  list. 

The  type  V[ai:Ki, . . .  ,an'-Hn\  Code(Q;e  x  |ci|)(|c2|)(|c3|)  therefore  describes  a  polymorphic  code 
function  which  in  addition  to  the  32  bit  arguments  described  by  |ci|,  expects  a  first  argument 
described  by  cce-  Referring  to  this  type  as  Tc,  the  type  x[rc,ae]  classifies  a  function  of  this  type 
paired  with  an  actual  environment  of  the  same  type:  ae- 

What  then  is  cce?  The  important  point  of  the  closure  conversion  translation  in  a  typed  setting 
is  that  the  types  of  the  free  variables  of  a  function  are  not  apparent  from  its  type.  The  environment 
type  is  abstract  outside  of  the  local  scope  where  the  function  is  defined.  All  that  need  be  apparent 
from  the  type  is  that  the  type  of  the  environment  expected  by  the  function  and  the  type  of  the 
environment  provided  in  the  closure  coincide. 

In  order  to  capture  this  abstraction,  the  standard  technique  is  to  use  existential  quantification. 
This  then  is  the  last  part  of  the  translation  process.  The  final  type  is  produced  by  existentially 
quantifying  over  the  environment  type,  ae- 

6.2.2  Typing  contexts 

The  translation  on  typing  contexts  'k  and  T  is  defined  in  the  obvious  way  by  mapping  the  type 
translation  across  the  ranges  of  the  contexts. 
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6.2.3  Terms 


Term  translations 

Small  values 

'k;  A;  T  h  SI! :  r  sv' 

|^|;A;|r|  h  sv':\t\ 

Float  values 

'k;  A;  T  h  /y  :  Float  fv' 

'k  ;  A;  T  h  fv' :  Float 

Operations 

di;  A;  T  h  opr  :  r  opr' 

4'  ;A;  T  h  opr' :\t\  oprgj 

64  bit  Operations 

'k;  A;  T  h  fopr  :  Float  ^  fopr' 

l^l;A;|r|  h /opr':  Float  opCg^ 

Expression 

'k;A;r  h  e:T  dine' 

'k  ,  4'(d);  •  h  d  ok 

'k  ,  'k(d);  A;  T  h  e' :  r  exp 

Programs 

^  p:T  p' 

'k  h  p' :  r 

The  high-level  shape  of  the  term  level  translation  should  be  apparent  from  the  discussion  of 
the  type  translation.  The  essential  task  of  the  translation  is  to  replace  functions  with  existentially 
quantified  code/environment  pairs,  and  to  replace  function  calls  with  closure-projections  and  code 
calls. 

The  term  level  translation  is  a  typed  translation,  since  the  types  of  free  variables  is  needed  to 
construct  the  appropriate  environment  types.  Additionally,  it  is  very  important  in  the  LIL  that  the 
substitutions  associated  with  type  refinement  operations  be  carried  out  as  part  of  the  translation, 
since  these  change  the  set  of  free  type  variables  of  expressions  (as  well  as  affecting  equalities  between 
them) . 

The  general  form  of  the  closure  conversion  translation  therefore  exactly  parallels  the  typing  rules 
for  the  LIL.  In  addition  to  checking  the  type  of  terms  however,  the  closure  conversion  relation  also 
“produces”  a  new  closure  converted  term.  Moreover,  in  the  case  of  the  expression  translation,  a 
new  set  of  heap  values  is  also  produced  containing  the  new  code  replacing  the  functions  from  the 
original  term. 

The  interesting  cases  from  the  translation  arise  from  application  and  function  abstraction.  In 
order  to  simplify  the  syntactic  structure  of  the  translation,  it  is  convenient  to  catch  applications 
in  the  expression  translation  rather  than  in  the  more  obvious  operation  translation.  Consequently, 
the  operation  translation  has  no  case  for  application. 

Function  application 

A;r  h  5:  V[ai::Ki, . . . ,  an::Kn](Ti,  •  •  . . . ,  (/>n)  ^  r s?;' 

A;r  h  sVi-.Ti[ci/ai, . . .  ,c„/a„]  ^  sv[  A;r  h  fv^ :  (pi[ci/ai, . . .  ,c„/an]  fv'i 
'k;  A;  T,  X  :  r[ci/ai, . . . ,  Cn/on]  b  e  :  r'  d  ine' 

A;r  h  letx  =  g[ci, . .  .,Cn\{svi, . . . ,  svm){fvi,  ■  ■  ■  Jv^.)  ine-.r' 
d  in 

let[ae;  Xc]  =  unpack  sv'  in 
let  /  =  select^  Xc in 
let  Xe  =  select^  Xc  in 

let  X  =  call  /[|ci|, . . . , \cn\]{xe,  sv[, sv'^){fv[, . . .  Jv'k)  ine' 

Since  function  definitions  are  replaced  with  closure  definitions  by  the  translation,  the  variable  g 
will  be  bound  to  a  closure  after  the  translation.  The  new  term  replacing  the  application  first  unpacks 
the  closure  to  get  at  the  underlying  code/environment  pair,  and  then  selects  out  the  components. 
The  code  pointer  thus  extracted  is  then  passed  the  environment  as  an  argument  along  with  all  of 
the  original  arguments. 
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Function  abstraction 


The  translation  of  function  definitions  is  by  far  the  most  complicated  part  of  closure  conversion. 
The  complete  translation  rule  is  given  in  figure  6.1.  Conceptually,  it  can  be  separated  into  two 
steps:  producing  code  to  construct  the  environment  and  the  closure,  and  wrapping  the  function 
body  in  code  to  extract  the  components  of  the  environment. 

Creating  the  closure 

The  environment  must  contain  all  of  the  free  variables  of  the  function:  that  is,  all  of  the  free 
variables  of  the  function  body  except  the  function  arguments  and  the  function  itself.  Since  the 
LIT  as  defined  lacks  heterogeneous  tuples,  the  64  bit  free  variables  must  be  boxed  before  they  can 
be  placed  in  the  environment.  Consequently,  the  closure  creation  code  begins  by  boxing  up  all  of 
the  free  64  bit  variables,  then  creates  a  record  consisting  of  all  the  32  bit  variables  followed  by  the 
boxed  64  bit  variables.  This  record  is  the  function’s  environment. 

In  addition  to  closing  up  the  function  over  the  free  term  variables  via  the  environment,  it  is  also 
necessary  to  abstract  away  all  of  the  free  type  variables.  These  are  “packaged”  up  into  the  closure 
by  immediately  instantiating  the  label  for  the  new  code  function  with  the  free  type  variables  from 
the  original  function.  Since  the  type  parameters  do  not  actually  exist  at  runtime,  there  is  no  need 
to  store  them  separately  in  the  closure  via  a  kind  existential. 

To  create  the  final  closure  value,  this  partially  instantiated  label  is  written  into  a  two  field 
record  with  the  environment  which  is  then  packed  into  the  existential  type,  hiding  the  type  of  the 
environment.  The  original  function  name  is  bound  to  the  resulting  package. 


Restoring  free  variables  from  the  environment 

The  code  from  the  previous  section,  executed  at  the  site  of  the  original  function  definition,  serves 
to  create  the  closure  which  is  then  passed  around  in  place  of  the  function.  A  second  code  sequence 
must  also  be  added  as  a  prelude  to  the  function  body  to  restore  the  free  variables  of  the  function 
from  the  environment.  This  code  can  be  seen  in  the  fcode  definition  from  the  function  translation 
rule. 

The  initial  segment  of  the  un-packaging  code  simply  projects  out  the  32  bit  variables  from  the 
environment.  The  second  segment  projects  out  and  un-boxes  the  boxed  64  bit  variables  from  the 
environment.  The  final  segment  creates  a  closure  to  bind  the  function  name  to  for  internal  uses. 
This  consists  of  allocating  a  tuple  to  hold  the  code  pointer  and  the  environment,  and  packing  it 
into  the  existential.  The  necessity  of  recreating  the  closure  tuple  inside  of  recursive  functions  is  the 
principal  disadvantage  of  the  “recursive-code”  approach  to  closure  conversion.  Note  though  that 
this  is  only  necessary  for  escaping  occurrences  of  the  recursion  variable:  non-escaping  occurrences 
can  be  replaced  by  uses  of  the  code  pointer  and  the  original  environment.  This  will  be  discussed 
in  more  detail  in  section  9.3. 

6.2.4  Heap  Values  and  Programs 

Code  functions  are  already  closed,  and  hence  are  not  directly  closure-converted.  However,  in  the 
LIT  as  defined,  they  may  contain  normal  functions  as  sub-terms.  Therefore,  the  bodies  of  code 
functions  are  translated  just  as  any  other  expression. 
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A  h  Ki  ok  (i  G  1 . . .  A:) 

A,ai:Ki, . .  .,ak:Kk  H  r*  :T32  {i 
A,ai:K.i, . . .  ,ak-.Hk  ^  </>i  :T64  (i  G  1 . .  .n) 

A,ai:Ki, . .  .,ak-.Kk  H  r  :T32 

A  \~  Tj-  =  V[oi.Kl,.  CXk-Hk]  ■  ■  ■  1  T~m)  (*/’!  :■■■■>  </’n)  ^  T  ■  T32 

A,  ar.Ki, OfciKfc;  r,  f:T,  xi:ri  ,  .  .  .  ,  Xm.Tyyi^  Zi'.cpi, . . .  ,Zn:4>n  \-  ef.T^  dlne'f 
'k;  A;  r[/  ■.  Tr\\-  e-.T  ^  d!  i.ne' 

A’32(e/)  \  {a:i,  =  {vi,  ■  ■  ■  ,yq}  fv&i{ef)  \  {zi, . .  .,Zn}  =  {n,. .  .,rp} 

fvtief)  \  {oil,  =  {/3i, . . . ,  /3n} 

Te  =  x[|r(yi)|,...,|r(yg)|,Boxed|r(ri)|,...,Boxed|r(rp)|] 
fcode  code[/3i:A(/3i), . . . , /3j:A(/3«),  oiiki,  . . .  ,afc:Kfc] 

(Xe-Te,  .  .  .  ,  Xm-\Xm  \ ) 

let  yi  =  select^  Xe  in 

let  yq  =  select^  Xe  in 
let  hi  =  select'?+^  Xe  in 
let  ri  =  unbox  hi  in 

let  hp  =  select'?'*'^'*'^  Xe  in 
let  Tp  =  unbox  bp  in 
let  fc  =  (fcode[/3i, . .  .,Pi],Xe)  in 
let  /  =  pack  fc  as  |r|  hiding  Tc  in 

_ 4 _ 

'k;  A;  r  h  let,-  rec^-^  /[a:'!K](xTV)(x/).e/  in  e  :  V[a:lK](r)(|x/|)  t  ^ 

d,  fcode:|r,.|  fcode,  d' 
let  hi  =  box  ri  in 

let  bp  =  box  Tp  in 
letXe  =  {yi,  ...,yq, hi,..  .,bp)  in 
let  fc  =  (fcode[/3i, . . . ,  Pi],Xe)  in 
let  /  =  pack  /c  as  |r |  hiding  Tg  in 


Figure  6.1:  Closure  converting  recursive  function  definitions 
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. . . ,  xi:ri, . . .  ,Xm-rm,  . .  .,Zn.(t)n  h  e  :  r dine' 


I-  code[ai:/ii, . . .  ,ak-.Hk]{xi:Ti, . . .  ,Xm'-Tm){zi\(t)i, . .  .,Zn4n)-e-. 

V[ai:Ki, . .  .,ak-.Kk]  Code(ri, . . .  •  •  •  ,</>n)('r) 

din  code|^|[ai:Ki, . . .  ,ak-.Kk]{xi-.\Ti\ 

Heaps  are  translated  by  translating  the  individual  heap  values,  lifting  out  any  new  heap  values 
to  the  top  level. 


'h  h  e  e  ine 


h  hval  :t  in  hval'  h  d  d^  ind' 


'h[£:r]  h  d,i:T  ^  hval  di,  dh  ind',£:|r|  hval' 

The  translation  on  programs  rewrites  the  heap  and  the  expression  body  to  form  a  new  program. 

'h,  'l'(d)  h  d  dh  in d' 

'h,  'l'(d);  •  h  e  :  r  ^  de  ine' 

'h  h  letrec  d  in  e  :  r  letrec  d' ,dh,  de  in  e' 

6.3  Soundness  of  closure  conversion 

The  proof  of  type  preservation  for  the  closure  conversion  translation  is  fairly  straightforward,  as  only 
a  few  sub-terms  are  actually  changed.  In  general,  most  of  the  cases  follow  directly  by  induction, 
reconstructing  the  original  term  or  derivation  from  the  inductively  re-written  sub-terms  or  sub¬ 
derivations.  Only  in  the  few  cases  where  the  immediate  term  is  re-written  does  the  proof  involve 
anything  more  substantial. 

6.3.1  Constructors 

As  with  the  MIL  to  LIL  translation,  I  begin  by  proving  the  soundness  of  the  constructor  transla¬ 
tion,  and  then  proceed  with  auxiliary  lemmas  showing  that  the  constructor  translation  commutes 
with  substitution,  and  respects  equivalence.  The  latter  two  lemmas  are  useful  for  the  term  level 
proofs. 

Theorem  10  (Soundness  of  the  constructor  translation) 

If  A  h  c:  K  then  A  h  |c|  :  k. 

Proof:  By  induction  on  the  structure  of  types.  For  all  types  and  constructors  except  the  type  of 
polymorphic  functions,  the  translation  leaves  the  structure  of  the  type  the  same,  and  so  the  proof 
follows  straightforwardly  by  induction,  producing  the  new  derivation  from  the  inductively  obtained 
sub-derivations  using  the  same  rule  as  in  the  original  derivation.  The  only  interesting  rule  is  the 
rule  for  the  type  of  polymorphic  functions. 
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Suppose  the  type  being  translated  is  of  the  form  V[ai:Ki, . . . ,  an-Kn]-  (ci)(c2)(c3). 

By  inversion  of  the  assumed  derivation: 

A  h  Kj  ok  i  G  1 ..  .n 
A,  ai'.Ki,  .  .  .  ,  an-Kn  b  Cl  :  T32list 
A,  apKl,  .  .  .  ,  an-Kn  b  C2  :  T32list 
A,  Oli-Ki,  ■  ■  ■  1  b  C3  .  T32 

By  induction: 

A,q;i:ki,  . .  .,an-Kn  b  |ci|  :T32list 
A,  ai'.Ki, . . . ,  an-Kn  b  |C2|  :  T32list 
A,  CXi-Ki,  ■  ■  ■  1  Oin-Kfi  b  |c3|  .  T32 
By  weakening: 

A,  ar.Ki, an-Kn,  ae:T32  b  |ci|  :  T32list 

A,  ar-Ki, an-Kn,  ae--T32  b  |C2|  :  T32list 

A,  ai:Ki,  .  .  .  ,  an-Kn,  ae'-T32  b  |C3|  :  T32 

(Note  that  a^  can  always  be  chosen  appropriately) 

By  construction  (cons): 

A,  ar-Ki,  an-Kn,  ae:T32  b  (Oe  ::  |ci|) :  T32list 
By  construction  (application  and  universal  introduction): 

A,Q;e:T32  b  V[ai:Ki,...,an:Kn]-Code(ae  ::  |ci|)(|c2|)(|c3|) :  T32list 
By  construction  (pairing  and  existential  introduction): 

A  b  3[ae:T32].  x  [V[ai:Ki, . .  .,«„:«:„].  Code(ae  ::  |ci|)(|c2|)(|c3|),  Oe]  :T32 

■ 

Lemma  23  (The  type  translation  commutes  with  substitution) 

\c[c'/a]\  =  \c\[\d\/a\ 

Proof:  By  induction  on  c. 

1.  If  c  is  a  constant,  then  a  ^  fvc,fv\c\,  so  |c|[|c'|/a]  =  |c|  =  |c[cYq;]|. 

2.  If  c  is  a  variable  a': 

(a)  If  a  /  a'  then  as  in  the  previous  case. 

(b)  If  a  =  a'  then  |a|[|c'|/a]  =  Q;[|c'|/a]  =  \c'\  =  |q;[c'/q;]|. 

3.  If  c  =  X{(3:k)-C2,  then  by  definition 

|A(/3:k).C2|[|c'|/«]  =  (A(/3:|k|).|c2|)[|c'|/«]  =  A(/3:|«|).(|c2|  [|c'|/a]) 

By  induction  and  the  definition  of  substitution: 

A(/3:|«|).(|c2|[|c'|/a])  =  A(/3:|^|).(|c2[c7a]  |)  =  |A(/3:«).(c2[c7a])| 

Note,  I  rely  on  alpha-variance  to  ensure  non-capture. 

4.  If  c  =  y[ai:Ki,  -  - . ,  an-Kn]  -  (ci)(c2)(c3)  then  by  the  definition  of  the  translation  and  of 

substitution 

|V[ai:«:i, . . an-Kn]- (ci)(c2)(c3)| [|c'|/a] 

(3[ae:T32].  x  [V[ai:Ki, . . . ,  Code(Q;e  ::  |ci|)(|c2|)(|c3|),  ae])[|c'|/«] 

=^3[ae:T32].  x  [V[ai:«:i,  Code(ae  ::  |ci|[|c'|/a])(|c2|[|c'|/a])(|c3|[|c'|/«]))  «e] 
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Note  that  I  rely  on  alpha-variance  to  avoid  capture. 
Finally,  by  induction,  and  by  the  definition  of  substitution: 


3[Q;e:T32].  X  [V[ai:Ki, . .  Code(ae  ::  |ci|[|c'|/a])(|c2|[|c'|/a])(|c3|[|c'|/«])>«e] 

=^3[ae:T32].  x  [V[q;i:ki,  . . . ,  a„:Kn].  Code(Q;e  x  |ci[c7a]|)(|c2[c7a]|)(|c3[c7a]|),ae] 
|V[ai:Ki, . . .  ,a„:Kn].  ^  (ci)(c2)(c3)[c7a]| 

5.  If  c  =  C1C2,  TTiC,  (ci,  C2),  the  proof  proceeds  similarly. 


Lemma  24  (The  type  translation  respects  equivalence) 

If  A  h  Cl  =  C2  :  K  then  A  h  |ci|  =  |c2|  :  n. 

Proof:  (By  induction  on  equivalence  derivations)  By  induction  on  equivalence  derivations.  The 
proof  is  straightforward  with  almost  all  cases  identical  to  the  proof  of  lemma  10  from  chapter  5.  The 
only  cases  where  the  derivation  changes  is  in  the  case  that  the  last  rule  used  is  the  structural  rule 
for  constructor  application  ci  C2.  Inductively,  equivalence  derivations  exist  for  the  sub-terms.  If  the 
applications  match  the  form  of  a  polymorphic  function  (that  is,  V[ki](.  . .  (V[k,i](— >(ca;)(cz)(cr))))), 
then  the  application,  reflexivity,  pair,  and  existential  equivalence  rules  must  be  applied  to  construct 
the  equivalence  derivation  for  the  translation.  If  the  applications  do  not  match  such  a  form,  then 
the  application  rule  suffices.  All  of  the  primitive  types  follow  directly  by  reflexivity  or  by  the 
structural  application  rule.  I  give  some  example  cases  here. 

1 .  Suppose  A  h  c  =  c :  K  by  reflexivity. 

By  assumption: 

A\-  c:  K 

By  theorem  10: 

A  h  |c|  :  K 

By  reflexivity: 

A  h  |c|  =  |c|  :  K 

2.  Suppose  A  h  (A(a:Ki).ci)(c2)  =  ci[c2/a]  :  K2- 

By  assumption: 

A  h  Ki  ok 
A  ,  a:Ki  h  Cl :  K2 
A  h  C2  : 

By  assumption,  and  by  theorem  2: 

A  h  Ki  ok 
A,  a:Ki  b  |ci|  :  K2 
A  h  |c2|  :  Ki 

By  the  A  beta  rule  and  the  definition  of  the  translations: 

A  h  |(A(a:Ki).ci)(c2)|  =  |ci|[|c2|/a]  :  K2 

Finally,  by  lemma  9: 

A  h  |(A(a:Ki).ci)(c2)|  =  |ci[c2/a]|  :  K2 
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3.  Suppose  A  h  V[ai:«;i, . . .  ,an:«^n]-  ^  (ci)(c2)(c3)  =  V[ai:Ki, . .  ^  (4)(c2)(c3) :  T32. 

By  induction  and  weakening 
A,  ae:T32,  ap/ti . . .  h  ci  =  :  T32list 

A,  ae:T32,  ar.ni . . .  an-Kn  k  C2  =  4  :  T32list 
A,  Q!e.T32,  Ol\.K\  .  .  .  CVn-Kn  ^  ■  T32 

By  reflexivity 

A,  0;e*T32;  .  .  .  CXfi.Kfi  k  OL^  —  OL^  .  T32 

By  the  pair  and  existential  rules 

A  k  3[Q;e:T32].  X  [\/[ai:Ki,. . .  ,an:Kn].Code{ae  ::  |ci|)(|c2|)(|c3|), Oe] 

=  3[ae:T32].  x  [V[ai:Ki, . . . ,  an:/«n]- Code(ae  ::  kiDd^DdcsD)  “e]  :  T32 


6.3.2  Typing  contexts 

Theorem  11  (Soundness  of  the  context  translations) 

1.  If  Ah  T  ok  then  A  k  |r|  ok. 

2.  If  k  'k  ok  then  k  |'k|  ok. 

Proof:  By  induction  on  typing  contexts.  The  proof  proceeds  by  cases  on  contexts. 

1.  Suppose  A  k  r  ok. 

•  If  r  =  •,  then  its  translation  is  empty,  and  hence  is  trivially  well-formed. 

•  If  r  =  T',x:t,  then  by  induction,  A  k  |r'|  ok.  By  theorem  10,  A  k  |r|  ok,  and  since 
the  domain  is  unchanged  by  the  translation,  x  ^  T' .  Therefore,  by  construction  A  k 
|r',x:|r||  ok. 

2.  The  proof  for  heap  contexts  di  proceeds  identically. 

■ 

6.3.3  Terms 

For  terms,  I  begin  by  stating  the  soundness  theorem  for  values  and  64  bit  operations.  For  these 
syntactic  classes,  the  statement  of  the  theorem  is  straightforward. 

Theorem  12  (Values  and  64  bit  operations) 

1.  If  'k;  A;  F  k  sv  :t  and  'k;  A;  F  k  s?; :  r  sv'  then  I'kj;  A;  |F|  k  sv' :  |r|. 

2.  If  'k;  A;  F  k  /c  :  Float  and  'k;  A;  F  k  /c  :  Float  ^  fv'  then  I'kj;  A;  |F|  h  fv' :  Float. 

3.  If  'k;  A;  F  k  fopr  :  Float  oprg^  and  'k;  A;  F  k  fopr  :  Float  fopr'  then 

I'kj;  A;  |F|  k  fopr' :  Float  oprg^ 
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Proof  (Sketch):  By  induction  on  terms.  The  closure  conversion  translation  for  each  of  these 
syntactic  classes  leaves  the  top  level  structure  of  the  term  intact,  reconstructing  it  from  the  induc¬ 
tively  re-written  sub-terms.  Therefore,  the  proof  constructs  a  new  derivation  in  each  case  by  using 
the  same  inference  rule  from  the  original  derivation  on  the  inductively  re-written  sub-terms.  Kinds 
are  unchanged  by  the  translation,  and  for  constructors  I  appeal  to  theorem  10. 

■ 

The  theorem  for  32  bit  operations  and  expressions  is  more  complicated,  since  the  expression 
translation  potentially  yields  new  heap  bindings  (code  functions)  to  be  lifted  out.  The  soundness 
theorem  for  expressions  states  that  both  the  bindings  and  the  new  expression  are  well-formed  in  the 
heap  context  obtained  by  extending  the  translation  of  the  original  heap  context  with  the  bindings 
for  the  additional  heap  values  produced  by  the  translation. 


Theorem  13  (32  bit  operations  and  expressions) 

1.  If 

'k;  A;  T  h  opr  :  r  oprgj  and  di;  A;  T  h  opr  :  r  opr' 

then 

|'k|;  A;  |r|  h  opr' :  |r|  oprgj 


2.  If 


'k;  A;  T  h  e  :  r  exp  and  ih;  A;  T  h  e  :  r  ^  d  ine' 

then 

|'k|, 'l'(d)  h  d  ok  and  |'k|, 'l'(d);  A;  |r|  h  ine' :  |r|  exp 


Proof:  By  induction  on  terms. 


1.  The  proof  for  operations  proceeds  exactly  as  in  the  previous  theorem,  appealing  to  the  induc¬ 
tion  hypothesis  to  obtain  well-typedness  derivations  for  sub-terms,  and  constructing  a  new 
derivation  with  the  same  rule  as  in  the  original  derivation.  Note  that  the  case  for  applications 
is  trivially  true,  since  the  operation  translation  is  not  defined  on  applications. 

2.  The  proof  for  expressions  also  proceeds  similarly,  except  in  the  cases  binding  applications  and 
functions  where  the  translation  does  incremental  work. 

Suppose  letx  =  g[ci, . .  .,Cn]{svi, . .  .,svm){fvi, . . .  ,/cfc)  ine 

By  assumption: 

A;r  h  sv  :y[ai::Ki, . . .  ,an::Kn]{Ti,  ■  ■ .  ,Tm){4’i,  ...,4>k)^T 
A\-  Ci'.  Ki  i  G  1 . . .  n 

'k;  A;  T  h  svi :  ri[ci/ai, . . . ,  Cn/on]  i  G  1 . .  .m 
4';  A;r  h  fv^:  4>i[ci/ai, . . .  ,Cn/an]  i  Gl-.-k 
'k;  A;  r,x:r[ci/ai, . . .  ,Cn/an]  heir'  exp 

A;r  h  5  :  V[ai::Ki, . . .  ,an-:Kn]{Ti, . .  .,Tm){k)  ^  t  sv' 

A;  T  h  svi :  ri[ci/ai, . . . ,  Cn/on]  ^  sv'j^ 

A;r  h  fvi:(j)i[ci/ai, . . .  ,Cn/an]  H 
'k;  A;  T,  X  :  r[ci/ai, . . . ,  Cn/otn]  b  e  :  r'  dine' 
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It  suffices  to  show  that: 

I'I'I,  'l'(ii);  A;  |r|  H  let[Q;e,  Xc]  =  unpacks?;'  in  :  |r'|  exp 

let  /  =  select^  Xc  in 
let  Xg  =  select^  Xg  in 

let  X  =  call  /[|ci|, . . . , \cn\]{xe,  sv[, sv'^){fv^, . . .  ,fv'^) 

ine' 

Let 

Tc  =  x[V[ai:Ki, . . .  ,a„:«:J.Code(ae,  Ini  )  •  •  •  )  Itm  |)(|;^i|,...,|4l)(M),ae] 

and 

Tf  =  V[ai:Ki, . . . ,  an-.Kn].Code{ae,  |n|  )  •  •  •  )  Itm  I)(l<^il,---,I4I)(M) 


So  by  the  expression  typing  rules,  it  suffices  to  show: 

|^|,^((i);A,ae:T32;|r|  hs?;':3[ae:T32].n 
|4'|,^'((i);  A,Q;e:T32;  |r|,  Xc:Tc  h  select^  Xc:r/  opUgj 
|^'|,^'((i);  A,ae:T32;  |r|,  Xc:Tc  L  select‘d  Xg:  Og  optgj 

|^'|,^'((i);  A,Q;e:T32;  |r|,  Xc:Tc, /:r/,  Xe:ae  b  call /[|ci|](xe,  s?;'.)(/?;l) :  |T|[|ci|/ai]  oprgj 

|4'|,^'((i);  A,Q;e:T32;  |r|,  Xc:Tc, /:r/,  Xe:ae,  a;:|T|[|ci|/ai, . . . ,  |cn|/an]  b  e':  |t'|  exp 
By  theorem  12: 

l^'l,  ^'(d);  A;  |r|  b  sv  :  |V[ai::Ki, . . . ,  an::«^n](n,  •  •  .,Tm){4>i,  •  •  •  ^  r| 

And  by  definition: 

I  V[q:i  .Kl ,  .  .  .  ,  CXn-Kn]  '(n  >  •  •  •  ;  tm)  {4^1  j  ■  ■  ■  j  ^  1 1  — 

3[ae:T32].  x  [V[ai:Ki, . . .  ,«„:«:„].  Code(ae,  |n|  )  •  •  • )  Inn  |)(|;/>i|,...,|</.fc|)(|r|),ae] 

So  by  the  unpack  rule: 

l^'l,  A,Q;e:T32;  |r|  b  sv' :  3[Q;e:T32].rc  For  some  Tg 

By  the  variable  rule: 

|^'|,^'((i);  A,Q;e:T32;  |r|,  Xc:Tc  b  Xc  :  x  [r/,ae] 

So  by  the  select  rule: 

|^'|,^'((i);  A,ae:T32;  |r|,  Xc:Tc  b  select^  Xc:r/  opugj 
|^'|,^'((i);  A,ae:T32;  |r|,  Xc:Tc  b  select^  Xc  :  ae  opugj 

By  the  variable  rule: 

|^'|,^'((i);  A,Q;e:T32;  |r|,  Xc:Tc, /:r/,  Xe:ae  b  /  :  r/ 

By  theorems  10  and  12: 

A  h  I  Cj  I  :  Ki  i  G  1 . . .  n 

|^'|,^'((i);  A;  |r|  b  s?;' :  |ri[ci/ai, . . . ,  Cn/an]|  f  G  l...m 

l^'l,  A;  |r|  b  /?;'  : \4>i[ci/ai, . . .  ,Cn/an|]  i  G  1 . . .  A: 

By  the  commutation  lemma  (lemma  23): 
l^'l,  A;  |r|  b  s?;' :  |ri|  [|ci|/ai, . . . ,  |cn|/an]  ?  G  1 . .  .m 

1^1,  A;  |r|  b/c'  : \4)i\[\ci\/ai,. . . ,  |cn|/an]  ?  G  1 . . .  A 
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Therefore  by  the  call  rule  : 

A,Q;e:T32;  |r|,  XcITc, /:r/,  Xe:ae  ^  call  f[\ci\]{xe,  sv'j){fv'i) :  |t|[|q]7^]  opr32 
Finally,  by  induction: 

A,ae:T32;  \T\,  Xc-Tc,  f:Tf ,  Xe'-cte,  x:\T\[\ci\/ai, \cn\/an]  b  e':  |t'|  exp 

3.  The  case  for  function  abstraction  proceeds  in  a  similar  fashion,  building  up  well-typedness 
derivations  for  the  fcode  definitions  and  for  the  closure  creation  code. 


An  additional  lemma  relating  translations  of  heap  contexts  to  translations  of  heaps  is  needed 
for  subsequent  theorems. 

Lemma  25  {^{d)) 

If  'h  h  d  d/j  ind'  then  |'h(d)|  =  'l'(d'). 

Proof:  By  induction  on  d. 

1.  If  d  =  e  then  d'  =  e  and  |'I'(d)|  =  e  =  'l'(d'). 

2.  If  d  =  dr,t.T  e->-  hval  then: 

By  assumption: 

:r]  h  hval  :t  ^  dgin hval' 

'h[f:rj  \-  dr  ^  dh  ind^ 

'I'[f:r]  h  d,  f:r  hval  d^,  d/i  ind'^,f:|r|  hval' 

By  definition: 

Md)\ 

=  \'l>{dr,i:T  hval)\ 

=  i^'(d,.),f:r| 

=  \'l>{dr)\,t\T\ 

By  induction: 

|d/(d,)i  =  d/(d;) 
so 

\'^{dr)\,t\r\ 

=  'I'(d'^),f:|r| 

=  'I'(d'^,  f:|r|  hval') 

■ 

As  with  expressions,  translating  heap  values  may  yield  additional  heap  bindings.  The  soundness 
theorem  for  the  heap  value  translation  states  that  the  both  the  new  heap  value  and  the  new  bindings 
are  well-formed  in  the  translation  of  the  original  context  extended  with  bindings  for  the  new  heap 
entries. 

Lemma  26  (Soundness  of  the  heap  value  translation) 

If 

'h  h  hval :  r  hval  and  'h  h  hval :  r  d  in  hval' 

then 

I 'h I , 'h (d)  h  d  ok  and  |'I'|, 'l'(d)  h :  |r|  hval 
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Proof:  By  construction. 

By  assumption: 

h  COde^[Q:i:Kl,  .  .  .  ,  ak'-Kklixim,  .  .  .  ,  Xm-Tm)izi-(pl,  ■  ■  .  ,  Zn'.tpn) -e  ■. 

V[ai:Ki, . .  .,ak-.Kk]  Code(ri, . . . ,  •  •  ■  An){T)  hval 

and 

I-  code^[ai:Ki, . . . ,  afc:Kfc](xi:ri, . . .  ,Xm'-Tm){zi\4)i, . . . ,  Zn-K) ■&  : 

V[ai:«:i, . .  .,ak:Kk]  Code(ri, . . .  ,Tm)(0i,  •  •  •  ,0n)(T)  ^ 
din  code|^|[ai:«:i, . . . ,  ak-.Kk]ixi:\Ti\  ,  .  .  .  ,  Xfn.\Tfn  \){zi:\(pi\,. . .  ,Zn:\4>n\)-e' 

By  inversion: 

•  h  Kj  ok  {i  G  1 ..  .k) 

ai'.Ki, . . .  ,ak:K.k^  n:T32 

Or  ./^i , . . . ,  (Xk'^k  ^02*Tg4  (iGl...7r) 

ai:Ki, . .  .,ak:Kk  1“  t:T32 

. . .  ,ak:Kk-,xi:Ti  ,  .  .  .  ,  XfYi.Tm,')  Zi-.(l)i, . . .  ,Zn.(j)n  I-  e:r  exp 
. . .  ,ak:Kk]Xi:Ti  ,  .  .  .  ,  XfYi.Tm')  Zi:4>i, . . .  ,Zn-.4>n  e-.T'^  dine' 

By  theorem  10: 

ai'.Ki, . .  .,ak-.Kk  H  \Ti\  :T32  (i  G  1 . .  .m) 

Ol./^i,  •  •  .  ,  CXk'kvk  h  \4^i\  •  ^64  ^  1  .  .  .  u) 

ai'.Ki, . .  ..ak'.tik  h  |t|  :T32 
By  theorem  13: 

I'I'I,  'k(d)  h  d  ok  and 

|^'|,^'(d);ai:Ki, . . . ,  ak'.Kk]  xi\\ti\  ,  .  .  .  ,  Xra.\Tm  \,zi\\(t)i\,. .  .,Zn.\(t)n\  h  e' :  |r|  exp 
Note  that: 

|V[ai:Ki, . .  .,ak-.Hk]  Code(ri, . . .  ,rm)(0i,  •  • .  ,(/)n)(T)| 

=  V[ai:Ki, . . .  ,ak-.Kk]  Code(|ri|  )  •  •  •  )  \Xm 
So  by  the  heap  value  rule  : 

|d'|,^'(d)  I-  code|^|[ai:Ki, . . . , afc:Kfc](xi:|ri|  ,  .  .  .  ,  XfYi.\Tm 

|V[ai:Ki, . .  .,ak-.Kk]  Code(Ti, . . .  ,rm)(</>i,  •  • .  ,(/>n)(r)|  hval 


Lemma  27  (Soundness  of  the  heap  translation) 

If  'k  h  d  ok  and  'k  h  d  ^  d/i  in d'  then  I'kj,  'k(dft)  h  d^,  d'  ok. 

Proof:  By  induction  on  d.  We  proceed  by  cases  on  d. 

1.  € 

By  assumption: 
h  'k  ok 

By  theorem  11: 
h  I'kj  ok 

By  the  empty  heap  rule: 

I'kj  h  e  ok 

(Note  that  'k(e)  =  •) 

2.  d,  t.T  hval 
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By  assumption: 

h  d,£\T  ^  hval  ok 

'k[£:r]  h  d,i:T  ^  hval  d^,  dh  ind' ,i:\T\  hval’ 

By  inverting  assumptions: 

'k[£:r]  h  hval :  r  hval 
'k[£:r]  h  hval  ■.T'^dein  hval' 

'k[£:r]  h  d  ok 
'k[£:r]  \-  d  ^  dh’i-rid' 

By  induction: 

mtT]\,^{dh)hdh,d'  ok 

By  lemma  26: 

\^[t.T]\,'^{d^)  h  di  okand 
\'^[t.T]\,'^{d^)  h  hval' :  |r|  hval 

By  lemma  weakening: 

\'^[£:T]\,'i/{dh),'^{de)  h  dh,d'  ok 
\'k[tT]\,'k{dh),'i>{di)  h  de  ok 

By  definition: 

\'i/[i:T]\,'i/{dh),'^{de)  h  dh,d',de  ok 
and|'k[£:r]|  =  |'I'|  [t':|r|] 

So  by  the  non-empty  heap  rule: 

|'I'[t':r]  I,  ^{di,  dh)  b  di,  dh,d' ,£:\t\  ^  hval'  ok 


Theorem  14  (Programs) 

If  'k  h  p :  T  and  'k  h  p :  r  p'  then  I'kj  h  p' :  |r 

Proof: 

By  definition,  p  is  of  the  form  letrec  d  ine. 
By  assumption: 

'k,  'k((i)  h  d  ok 

'k,  'k(d)  \-  d  d/i  ind' 

'k,  'k(d);  •  h  e  :  r  exp 

'k,  'k(d);  •  h  e  :  r  de  ine' 

By  lemma  25: 

|d/(d)|=d/(d') 

So  by  definition: 

|4',4'(d)| 

=  |'k|,|'k(d)| 

=  l^'l,  4'(d') 

By  lemma  27: 

|4',4'(d)|,4'(dfe)  h  dh,d'  ok 
And  by  lemma  13: 

|4',  4'(d)|,  4'(de);  •  h  e' :  |r|  exp 
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So  by  the  above  argument: 

and|'h|,  ^{de)',  •  b  e' :  |r|  exp 

By  weakening: 

|d/|,'l/(d'),'k(4),'k(4)b44'ok 

andl^'l,  ^(4),  ^'(4);  •  b  e' :  |r|  exp 

So  by  the  program  rule  : 

I'I'I  b  letrec  d',  4,  4  ine' :  |t| 
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Chapter  7 


TILTAL 


Closure  converted  LIL  code  is  sufficiently  low-level  that  it  is  practical  to  translate  it  directly  into  a 
typed  assembly  language.  In  this  chapter  I  present  such  a  language,  and  define  a  simple  translation 
from  LIL  into  it.  I  call  this  assembly  language  TILT  Typed  Assembly  Language  (TILTAL),  in 
reference  to  the  original  typed  assembly  language  (TAL)  [MWCG98].  The  overall  structure  of 
TILTAL  is  very  similar  to  the  stack  version  of  TAL  [MCGW02]. 

The  presentation  of  the  TILTAL  target  language  is  intended  to  suggest  how  the  ideas  used 
to  implement  type  analysis  in  the  LIL  translate  down  to  the  assembly  code  level.  The  actual 
implementation  targets  the  TALx86  infrastructure  [MCG'’“99]  which  is  significantly  different  in 
many  ways.  I  will  discuss  the  TALx86  type  system  in  more  detail  in  the  implementation  chapters. 


7. 1  TILTAL  overview 

I  begin  by  introducing  the  TILTAL  language  in  general  terms  and  describing  the  typing  judgements 
which  define  its  static  semantics.  The  syntax  for  the  language  is  described  in  figure  7.1.  The  kind 
and  type  levels  are  almost  entirely  the  same  as  that  of  the  LIL  and  are  mostly  elided.  Constructs 
for  describing  the  type  of  stacks  and  a  kind  ST  classifying  them  are  added.  Code  function  types  are 
eliminated  and  replaced  with  a  code  type  which  describes  the  stack  and  register  format  expected 
by  a  code  segment.  “Nonsense”  types  ns 32  and  ns64  are  introduced  to  describe  uninitialized  stacks 
slots,  with  corresponding  constants  ns32  and  ns64- 

A  table  describing  the  typing  judgements  for  TILTAL  is  given  in  figure  7.2.  The  complete  list  of 
inference  rules  for  these  judgements  can  be  found  in  appendix  C.  Most  TILTAL  terms  are  judged 
well- typed  with  respect  to  a  heap  context  'k  which  binds  labels  at  closed  types;  a  constructor /kind 
context  A  which  binds  kind  variables  and  binds  constructor  variables  at  kinds;  and  a  register  file 
type  r  which  describes  the  state  of  the  abstract  machine  by  mapping  register  names  to  types. 

In  the  the  next  several  sections,  I  will  introduce  the  important  syntactic  classes  in  TILTAL 
and  briefly  describe  their  use.  For  the  most  part,  readers  familiar  with  the  existing  literature  on 
typed  assembly  language  will  find  nothing  substantially  different. 

7.1.1  Stacks 

The  treatment  of  stacks  and  stack  types  in  TILTAL  is  much  the  same  as  in  previous  work  [MCGW02] 
I  give  a  brief  introduction  to  the  syntax  and  the  concepts  here. 
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k 

c,T,4>,a 

r 

r 


=  ...\  ST 

=  . . .  delete  ^ 

I  e  I  n  t>32  C  I  </>!  >64  c  I  fJi  o  (T2  I  sptr  CJ 

I  ns64  I  ns32  I  r  ^  0 

=  {ri:Ti,r2:T2,re:Te,rt:Tt,fi:(j)i,f2-4’2,sp  :  a} 
=  ri  I  r2  I  re  I  rt 


f 

Q 

s 

I 

w 

fv 

sv 

i 


ti 


I 


fl  I  f2 

relic  I  unrollc  |  inj.unioni-^  |  forgetunion 
pack[r]c  |  inj.dyn^ 

e  I  W  >32  S  I  I  >64  S 

r  I  ns64 

(■\i  \  ns32  I  tagj  I  w[^  \  sptri  |  q  w 
/  I  f  I  sp(i) 

r  I  sp(i)  I  w  I  s?;[^  |  q  sv 

movr,  sr  |  loadrr,r(z)  |  store  r(i),  sr 

malloc  r[ri, . . . ,  Tn]  (sti, . . . ,  svn)  \  mallocr  r(s?;i,  SV2) 

malloc0  r,/?;  |  malloc^  r(s?;,/?;) 

call  sv  I  brtagj  r,  sv  \  brtgdj  r,  sv 

brdynr,  sri,  sr2  |  dyntag^r 

swrite  sp(i),  sr  |  sallocn  |  sfreen 

mov  r,  sp  I  mov  sp,  sv 

unpack  [a,  r],  sv 

sub^  r,  svi,  SV2  I  upd^  svi,  SV2,  sv^ 
fmovf,/?;  I  floadrf,r  |  fstorer,/?; 
f  swrite  sp(i),fv 
svLb^f,svi,sv2  I  upd^s?;i,s?;2,/t 
vcase[ai.  dead  sv,  02]  c 
vcase[ai,  a2-  dead  sr]  c 
ref  ine[(/3, 7)]  c  |  ref  ine[f old/3]  c 
ret  I  jmpsr  |  halt,-  |  i]I\  ti;I 


hval 

::=  {w)  1  [uJ]  1  [11  U  1  dtag 

code[aiv.Ki, . . .  ,anV.Kn]T .1 

H 

::=  {li'.Ti  ^  hvali, . . .  ,in-Tn  hvaln} 

R 

::=  {ri  1-^  Wi,r2  1-^  W2,re  1-^  We,rt 

1-^  Wt, 

fl  1-^  /i,f2  1-^  ^2:sp  1-^  s} 

P 

::=  {H,R,I) 

::=  •  i:T 

A 

::=  •  A,  j  A,  a:K 

r 

::=  {ri:ri,  r2:r2,  re:re,  rt:ri,  £2 

■4>2,sp  :  a} 

Figure  7.1:  TILTAL  syntax 
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TILTAL  typing  judgements 

Kinds 

A  h  K  ok 

Constructors 

A  h  c :  K 

Equivalence 

> 

T 

III 

0^ 

?; 

Coercions 

A  h  (7 :  r  ^  r' 

Stacks 

'k;  A  h  s  :  CJ 

64-bit  values 

4^;  A  h  1:0 

32-bit  values 

'k;  A  h  re  :  r 

64-bit  operands 

4/;  A;  Fh /?;:</> 

32-bit  operands 

4^;  A;  F  h  sj! :  r 

Instructions 

^;A;Fhf  ^F' 

Instruction  Sequences 

4';  A;F  h  /  :t 

Heap  values 

4'A;  F  h  hval :  r  hval 

Heaps 

h  FI:4' 

Well- formed  Register  File 

4'  h  R:F 

Well-formed  Program 

h  {H,  R,  I)  ok 

Figure  7.2:  Judgements  defining  well-typedness  of  TILTAL  programs 


Stack  types  and  values  in  TILTAL  are  used  to  represent  the  actual  control  stack  of  a  program. 
Stack  slots  store  return  addresses  of  functions,  as  well  as  variables  which  cannot  be  spilled  into 
registers.  At  the  value  level  a  stack  is  either  empty  (e),  or  a  value  pushed  onto  a  stack  (w  >32  s  or 
I  >64  ■s)-  Stacks  in  TILTAL  are  permitted  to  contain  either  32  or  64-bit  values.  Pointers  into  the 
stack  are  permitted  in  a  limited  fashion.  The  meta-variable  s  is  used  to  stand  for  stack  values,  and 
the  meta-variable  a  is  used  for  the  type  of  stacks. 

Stack  types  are  built  up  from  empty  stacks  using  stack  addition  and  composition.  The  type 
e  describes  an  empty  stack,  the  type  r  >32  cr  describes  a  stack  described  by  a  with  a  32  bit  value 
described  by  r  pushed  on  top,  and  similarly  for  (j)>Q4a  when  cj)  describes  a  64  bit  value.  In  this  way, 
stack  types  closely  parallel  the  structure  of  actual  stacks.  In  addition  to  these  constructs,  stacks 
types  may  also  be  constructed  by  concatenating  two  stack  types.  Such  a  compound  stack,  written 
ai  o  a2,  is  equivalent  to  the  stack  obtained  by  prepending  all  of  ai  to  cj2.  Finally,  the  type  sptr(cj) 
describes  a  pointer  to  a  stack  described  by  a.  This  facility  is  used  to  compile  exception  handlers, 
which  must  maintain  pointers  into  the  stack. 

For  syntactic  convenience,  I  define  several  operations  on  stack  types  for  use  in  the  static  seman¬ 
tics.  I  define  stack  type  lookup  operations  cj[i]32  and  a[i]QA  which  find  the  appropriately  size  type 
in  the  stack  a  located  i  words  from  the  top  of  the  stack. 

Definition  1  (Stack  type  32  bit  subscript) 


(r  >32  O-)[0]32  =T 

(r  >32  cr)[n  +  1)32  CF[n]z2 

{(f)  >64  cr)[n  -h  2)32  <T[n]32 
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Definition  2  (Stack  type  64  bit  subscript) 


def 


((/>>64  cr)[0]64 

(r  >32  (y)[n  +  1]64  o-[n]64 

{4)  >64  (y)[n  +  2]64  o-[n]64 

In  a  similar  manner,  I  define  stack  type  updates  (cj)[i]32  <—  r  and  (cj)[i]64 
the  type  at  word  offset  i  in  cj  with  r  or  (/>  respectively. 

Definition  3  (Stack  type  32  bit  update) 


(j)  which  replace 


/ 

def  / 

32  ^ 

-  r 

=  r  >32  a 

/ 

def  / 

32  ^ 

-  r 

=  T  >32  nS32  >32  O' 

/ 

def  / 

32  ^ 

-  r 

=  nS32>32T  >32  0- 

(r  >32  cr)[n+  1]32 
((/>>64  cr)[n  +  2] 32 


r'  ((T)[n] 32  ^  r' 

r'  (cr)[n] 32  ^  r' 


Definition  4  (Stack  type  64  bit  update) 


in  >32  T2  >32  O-)[0]64 

def  ,/ 

=  >64  0- 

(n  >32  0>64  O-)[0]64  • 

def  ,/ 

=  (p  >64  nS32  >32  cr 

{(j)>64  O-)[0]64  ^  </>' 

def  ,/ 

=  0  >64<T 

((/>>64  0-)[l]64  ^  <P' 

nS32  >32  (nS32  >32  O-)[0] 

(t  >32  a)[n+  1]64  ^ 

(cr)[n]64  ^  (f>' 

((/>>64  (T)[n  +  2]Q4  ^ 

'==  (o')N64  ^  </>' 

Finally,  I  define  a  size  operation  on  stack  types  in  the  obvious  manner. 

Definition  5  (Stack  type  size) 


|t>32  Cf\ 
|</>>64  O- 


1  +  k| 

=  2  + Id 


7.1.2  Values 

Small  values  in  TILTAL  are  syntactically  divided  into  32  and  64-bit  sizes,  written  with  the  meta¬ 
variables  w  and  I  respectively.  The  types  of  32-bit  values  have  kind  T32  and  are  written  using  the 
meta-variable  r,  whereas  the  types  of  64-bit  values  have  kind  T64  and  are  written  with  the  meta¬ 
variable  (j).  The  two  forms  of  64-bit  values  are  IEEE  floating  point  numbers  or  nonsense.  32-bit 
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values  include  integers,  sum  tags,  pointers  into  the  stack,  nonsense  values,  and  labels.  Coercions 
applied  to  values  (qw)  change  the  type  of  the  value  but  have  no  runtime  effect.  Labels  serve  as 
pointers  to  values  allocated  on  the  heap.  As  with  the  version  of  the  LIL  extended  with  allocation 
primitives,  a  heap  type  'L  maps  labels  to  the  types  of  the  values  to  which  they  point. 


7.1.3  Operands:  Registers  and  stack  slots 

Instruction  operands  are  similarly  divided  into  32  and  64- bit  versions.  In  addition  to  simple  values 
as  described  above,  operands  can  be  registers  or  stack  slots. 

There  are  three  sorts  of  registers  in  TILTAL:  32-bit  registers,  64-bit  registers,  and  the  stack 
register.  By  convention,  I  reserve  one  of  the  32-bit  registers  for  the  exception  record  (as  described 
in  section  8.1.2).  Registers  are  used  to  hold  the  operands  and  intermediate  values,  and  to  pass 
arguments  to  functions.  The  registers  given  in  the  version  of  TILTAL  described  here  are  not 
specialized  to  the  x86  platform,  which  has  a  very  idiosyncratic  register  architecture. 

The  stack  register  is  used  to  maintain  the  control  stack.  Reading  and  writing  stack  slots  is  done 
using  offsets  from  the  front  of  the  stack  given  in  32  bit  strides.  Allocation  of  and  de-allocation 
of  stack  space  is  handled  by  special  instructions.  Newly  allocated  stack  space  contains  nonsense 
values  which  can  subsequently  be  overwritten.  Stack  slots  are  used  for  the  same  purposes  as  the  32 
and  64-bit  registers,  as  well  as  for  holding  return  addresses.  The  call  instruction  pushes  a  return 
address  onto  the  stack,  which  can  subsequently  be  used  by  the  ret  instruction. 


7.1.4  Instructions  and  instruction  sequences 

The  instruction  set  described  here  is  very  small  and  focuses  mainly  on  the  instructions  that  are 
interesting  from  a  typing  standpoint.  Instructions  are  provided  for  moving  values  between  the  heap, 
the  stack,  and  the  register  file  as  well  as  allocating  and  deallocating  space  on  the  stack  and  the 
heap.  Primitive  array  instructions  are  provided  to  handle  bounds  checking  internally.  Instructions 
such  as  the  array  operations  and  the  memory  allocation  operation  are  intended  to  be  implemented 
as  assembler  macros,  since  they  are  not  directly  provided  by  the  machine. 

The  typing  rules  for  instructions  are  somewhat  different  from  the  usual  sorts  of  typing  rules  in 
that  instead  of  ascribing  a  type  to  an  instruction,  they  describe  the  effect  that  the  instruction  has 
on  the  type  of  the  state.  The  judgement  'L;  A;  T  h  i  ^  T'  indicates  that  the  instruction  i,  when 
executed  in  a  state  described  by  a  register  file  type  T,  will  result  in  a  state  described  by  T'.  Note 
that  instructions  do  not  change  the  type  of  values  in  the  heap,  nor  the  set  of  free  types. 

In  addition  to  standard  instructions,  there  are  several  type  instructions  which  have  no  runtime 
effect.  The  vcase  instructions  correspond  to  the  vcase  constructs  of  the  LIL,  and  the  refine 
instructions  correspond  to  the  path  refinement  constructs.  These  instructions  can  be  eliminated 
when  types  are  removed.  Note  that  I  syntactically  segregate  the  type  instructions  from  the  ordinary 
instructions,  and  give  typing  rules  for  them  only  as  elements  of  code  sequences.  This  simplifies 
things  greatly,  since  typing  the  refinement  instructions  individually  would  presumably  require  the 
instruction  typing  judgement  to  return  a  substitution  with  all  of  the  related  complications. 

Instruction  sequences  are  a  sequence  of  instructions  and  type  instructions,  terminated  by  either 
ret,  jmp  sr,  or  halt,-.  The  return  instruction  expects  the  return  address  to  be  the  top-most  element 
of  the  stack,  and  pops  it  off  before  returning. 
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7.1.5  Heap  values,  heaps,  and  register  files 


Values  that  are  not  guaranteed  to  fit  in  32  or  64-bit  slots  are  always  allocated  on  the  heap.  Aggregate 
heap  values  include  tuples  of  32-bit  values,  arrays  of  32  or  64-bit  values.  Individual  64-bit  values 
are  also  permitted  to  be  allocated  on  the  heap  to  allow  them  to  be  “boxed”  and  used  in  contexts 
expecting  32-bit  values.  The  dtag  value  serves  as  a  place-holder  for  exception  tags,  where  the  label 
serves  as  the  unique  identifier  of  the  tag.  Finally,  code  blocks  are  also  allocated  in  the  heap. 

Heaps  are  mappings  from  labels  to  heap  values.  Values  in  the  heap  may  contain  references  to 
other  values  in  the  heap,  but  must  be  closed  with  respect  to  types. 

Register  files  are  mappings  from  register  names  to  values  of  the  appropriate  size  and  type  for 
the  given  register.  For  example,  register  files  map  32  bit  register  names  to  32  bit  word  values. 
Register  files  are  classified  by  register  file  types  F. 

7.1.6  Programs 

A  TILTAL  program  is  a  triple  consisting  of  a  heap  H,  a  register  file  R,  and  an  instruction 
sequence  I.  Execution  of  TILTAL  program  is  defined  by  transitions  between  TILTAL  programs. 
The  complete  dynamic  semantics  for  TILTAL  is  given  in  appendix  C. 

7.1.7  Typing  contexts 

The  form  of  heap  contexts  It  and  constructor  contexts  A  are  essentially  unchanged  from  the  LIL. 
As  before,  heap  contexts  give  types  to  labels,  and  constructor  contexts  bind  kind  variables,  and 
bind  constructor  variables  at  their  kinds. 

The  term  level  context  from  the  LIL  is  replaced  by  a  register  file  type  which  describes  the  type 
of  the  registers  and  stack  of  the  TILTAL  abstract  machine.  The  type  of  the  register  file  (and 
consequently  the  stack  as  well)  is  given  by  a  register  file  type  F  which  maps  each  register  to  the 
type  of  its  contents.  In  addition  to  being  used  as  part  of  the  static  semantics,  register  file  types 
are  included  into  the  type  level  to  describe  the  pre-conditions  of  code  segments.  The  type  F  ^  0 
describes  a  code  segment  which  expects  a  register  file  described  by  F. 

I  use  the  notation  F(r)  to  indicate  the  type  assigned  by  F  to  a  register  r,  and  similarly  for 
float  registers  f  and  the  stack  register  sp.  I  use  the  notation  F{r:r}  to  indicate  the  register  file 
type  obtained  by  replacing  the  type  of  r  in  F  with  r,  and  analogously  for  float  registers  and  stack 
registers. 


7.2  TILTAL  derived  forms 

7.2.1  Partial  instruction  sequences 

In  order  to  allow  TILTAL  expressions  to  be  built  up  somewhat  compositionally  during  the  trans¬ 
lation  of  LIL  code,  I  define  a  derived  notion  of  partial  instruction  sequences. 

Definition  6  (Partial  instruction  sequence) 

A  partial  instruction  sequence  S  is  a  series  of  ordinary  (non-refining)  instructions  not  terminated 
by  a  jump,  return  or  halt.  That  is, 

S  ::=e\i;S 
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I  define  a  judgement  defining  well-typedness  as  a  partial  instruction  sequence:  'I';  A;  F  h  5  ^  F' 
with  inference  rules  as  follows: 


^;A;Fhe^F 


^';A;Fhi^F'  ^';A;F'h5^F" 

^;A;Fhi;5^F" 

Definition  7  (Partial  instruction  sequence  composition) 

A  partial  instruction  sequence  S  can  be  composed  with  a  second  partial  instruction  sequence  S'  to 
form  a  new  instruction  sequence  S;  S'  as  follows: 

e;  S'  y 

The  result  of  composing  well-formed  partial  instruction  sequences  is  well-formed. 

Lemma  28  (Soundness  of  partial  instruction  sequence  composition) 

If  A;  F  h  5  ^  F'  and  A;  T' b  S'  ^  F'  then  A;  F  h  S;  S'  =>  F' 

Proof:  By  induction  on  S.  m 

Definition  8  (Partial  instruction  sequence  completion) 

A  partial  instruction  sequence  S  can  be  completed  with  an  instruction  sequence  I  to  form  a  new 
instruction  sequence  S]  I  as  follows: 


(i;5);/  =  f;(5;/) 

The  result  of  completing  a  well-formed  partial  instruction  sequence  with  a  well-formed  instruc¬ 
tion  sequence  is  itself  well-formed:  in  this  sense,  partial  instruction  sequence  completion  is  sound. 

Lemma  29  (Soundness  of  partial  instruction  sequence  completion) 

If  A;  F  h  5  ^  F'  and  A;  F'  h  I  ok  then  A;  F  h  5;  /  ok 

Proof:  By  induction  on  S.  m 

7.2.2  Heap  fragments 

For  convenience  in  the  LIL  to  TILTAL  translation,  I  also  define  a  notion  of  partial,  or  open  heaps, 
called  heap  fragments.  Heap  fragments  are  simply  incomplete  heaps,  with  their  own  derived  typing 
judgement.  Translations  of  LIL  terms  generally  produce  heap  fragments  in  addition  to  TILTAL 
terms,  reflecting  the  fact  that  the  translation  may  place  items  in  the  static  data  segment. 

Definition  9  (Heap  fragment) 

A  heap  fragment  F  is  a  syntactic  heap,  which  may  have  open  labels. 
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I  define  a  judgement  defining  well-typedness  as  a  heap  fragment  of  the  form  'h  h  F :  'h'. 

h  'h,  'h'  ok  'k'  h  hvali :  hval 


'k  h  {ii'.Ti  ^  hvali, . . . ,  in'-Tn  hvaln}  :  'k' 

Two  heap  fragments  Fi  and  F2  can  be  combined  to  form  a  new  heap  fragment  Ti;  T2. 

Definition  10  (Heap  fragment  combination) 


{^pri  1-^  hvah,  .  .  .Jn-Tn  ^  hvaln}]  {Hn+l-Tn+l  ^  hvaln+1,  ■  ■  ■  ,  (-m-Tm  ^  hvalm} 

—  1 .7"!  I  >  hval\ ,  .  .  .  ,  ^n‘^n  '  ^  hvaln^  -^n+l  ’^n+l  '  ^  hvaln-\-l ;  •  •  •  ;  ^  hvulni} 

The  result  of  combining  two  well-formed  heap  fragments  with  disjoint  labels  is  a  well-formed 
heap- fragment . 

Lemma  30  (Soundness  of  heap  fragment  combination) 

If  'k  h  Ti :  'ki  and  'k  h  T2  :  'k2  and  the  domains  of  Fi  and  F@  are  disjoint,  then  'k  h  Ti;  T2  :  \ki,  \k2 
Proof:  By  induction  on  Fi. 

■ 

A  heap  fragment  F  can  be  added  to  a  heap  H  to  produce  a  new  heap  F  +  Ff. 

Definition  11  (Heap  fragment  incorporation) 


{h'.Tl  hvah,  .  .  .,in-Tn  1-^  hvaln}  +  {^n+TTn+l  hvaln+1,  ■  ■  ■  ,  fm-Tm  ^  hvalm} 

def 

—  1 .7"!  I  >  hval\ ,  ,  fn''^n  '  ^  hvaln,  f-n+l  ’^n+l  '  ^  hvaln+1 ;  •  •  •  ;  fra'^ra  '  ^  hvalni} 

The  result  of  incorporating  a  well-formed  heap  fragment  with  a  well-formed  heap  with  disjoint 
labels  is  well- formed. 

Lemma  31  (Soundness  of  heap  fragment  incorporation) 

If  b  :  'k  and  'k  h  F  :  'k'  and  the  domains  of  H  and  F  are  disjoint,  then  h  F  -|-  F  :  'k,  'k' 

Proof:  By  induction  on  F. 
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Chapter  8 


The  LIL  to  TILTAL  translation 


The  final  stage  of  compilation  translates  LIL  programs  to  TILTAL  code.  This  process  makes  the 
control  structure  of  the  program  explicit  via  call,  return,  and  jump  instructions,  and  replaces  vari¬ 
ables  with  stack  slots  and  registers.  Since  TILT  uses  a  stack  to  allocate  activation  records,  it  is  also 
necessary  to  account  for  this  in  the  translation.  Stacks  have  been  dealt  with  before  in  the  context  of 
typed  assembly  language,  and  the  language  design  approach  taken  in  TILTAL  is  not  substantially 
different  in  nature  from  the  that  previously  mapped  out  by  Morrisett,  et  al.  [MWCG98,  MCGW02] 
(though  the  translation  methodology  is  quite  different).  The  actual  code  generation  is  not  unusual 
except  in  that  it  preserves  type  information. 

This  chapter  gives  a  detailed  presentation  of  the  translation  of  LIL  programs  into  TILTAL 
programs,  and  proves  the  soundness  of  this  translation  with  respect  to  an  abstract  model  of  register 
allocation. 


8.1  The  constructor  level 

The  type  translation  from  LIL  to  TILTAL  is  relatively  simple  compared  to  the  translation  from 
the  MIL.  For  the  most  part,  the  TILTAL  type  system  is  an  extension  of  the  LIL  system.  A  few 
notable  changes  at  the  type  level  are  discussed  in  more  detail  below. 

8.1.1  Kinds 

Every  LIL  kind  is  a  TILTAL  kind,  and  since  no  change  is  made  in  the  constructor  translation 
that  changes  kinds,  the  translation  on  kinds  is  the  identity.  In  general,  I  treat  every  LIL  kind  as 
its  own  translation. 

8.1.2  The  constructor  translation 

The  complete  constructor  translation  is  given  in  figure  8.1.  Note  that  the  ^  type  constructor  has 
no  translation,  since  the  translation  is  only  defined  on  closure-converted  programs.  This  could  be 
made  explicit  by  giving  a  syntactic  variant  of  the  LIL  for  closure  converted  programs  and  defining 
the  closure  conversion  translation  from  chapter  6  as  a  translation  between  programs  in  the  original 
LIL  and  programs  in  the  new  syntactic  variant.  However,  it  is  much  simpler  to  consider  each 
variant  as  a  refinement  of  a  broader  syntactic  class.  In  general,  throughout  the  translation  I  will 
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def 

=  a 

X{a:K).\c\ 

=  \C1\\C2\ 

def  I  I 
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A(a32:T32list,  a64:T64list,  ar:Ts2)X/[pi:ST,  p2:ST]. 

{ri:ns32,  r2:ns32,  fi:ns64,  h-nse^,  re.  Exnptr(p2),  rt:fi'S32, 

sp:  cont(|a32  @32  064  @64  e\){pi){p2){ar)  >32  (032  @32  (a64  @64  Pi  o  P2))} 
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Arrayg4| 
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=  VIM 
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def 
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Figure  8.1:  The  constructor  translation 
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assume  that  the  LIL  programs  under  consideration  are  of  the  closure-converted  variant  and  will 
not  remark  upon  this  further. 

The  only  interesting  work  of  the  constructor  translation  is  to  translate  the  type  of  code  functions 
into  code  sequence  types  of  the  form  T  ^  0.  This  can  be  thought  of  as  replacing  all  uses  of  the 
Code  constant  by  a  closed  defined  form  of  the  same  kind.  This  is  especially  convenient,  since 
the  soundness  of  the  translation  follows  almost  directly  after  showing  the  well-formedness  of  this 
definition,  since  any  well-formedness  derivation  of  a  LIL  constructor  can  be  turned  into  a  well- 
formedness  derivation  of  its  translation  by  replacing  all  uses  of  the  Code  axiom  with  uses  of  the 
appropriately  weakened  well-formedness  derivation  for  the  definition. 


The  translation  of  the  Code  type 

In  order  to  give  the  appropriate  definition  for  the  Code  type,  it  is  first  necessary  to  discuss  the  calling 
conventions  used  by  the  translation,  since  these  must  be  encoded  in  the  types  of  code  sequences. 
For  simplicity  in  this  translation,  I  use  a  so-called  “caller  pops”  calling  convention  in  which  a 
called  function  is  expected  to  return  the  stack  with  the  space  allocated  for  its  in-arguments  intact. 
This  convention  is  not  especially  convenient  for  doing  tail-call  elimination:  consequently  the  actual 
implementation  uses  a  “callee-pops”  calling  convention  in  which  a  called  function  is  expected  to 
de-allocate  the  space  allocated  for  its  in-arguments  before  returning.  The  formal  translation  uses 
the  caller-pops  convention  for  simplicity. 

The  treatment  of  exception  handlers  in  the  translation  is  slightly  different  from  that  used 
by  Morrisett  et  al.  [MCGW02]  in  which  two  registers  are  dedicated  to  the  exception  handling 
mechanism.  To  correspond  more  closely  with  the  current  TILT  implementation,  I  reserve  a  single 
register  to  hold  the  current  exception  frame.  Exception  frames  contain  the  address  of  the  current 
handler,  and  a  pointer  to  the  handler’s  stack.  Old  exception  frames  are  stored  on  the  top  of  the 
handler  stack. 


Definition  12  (Exnptr  and  Exnhndler) 


Exnhndler  =  X{p:ST). 

V[].{ri:  Dyn,  r2:ns32,  rt:ns32,  re:ns32,  sp:/>,  fi:ns64,  h'-ns^A}  0 
Exnptr  X{p:ST).  x  [Exnhndler (p), p] 

The  type  of  the  exception  handler  itself  is  parameterized  by  the  type  of  the  handler  stack.  Notice 
that  the  handler  expects  an  exception  packet  in  ri. 

Code  function  types  in  TILTAL  must  also  encode  the  type  of  a  function’s  continuation  in  the 
form  of  a  return  address  type.  I  define  a  type  cont  :ST  ST  T32  which  describes  the  machine 
state  expected  by  the  calling  code  upon  return.  Continuation  types  are  parameterized  over  the 
number  of  words  of  in-argument  space  on  the  stack,  the  return  type,  the  remainder  of  the  stack 
above  the  next  exception  handler,  and  the  remainder  of  the  stack  below  the  next  handler.  The 
return  value  is  passed  in  by  convention.  The  return  continuation  cannot  assume  anything  about 
the  argument  slots,  which  must  be  coerced  to  a  nonsense  value  before  returning. 
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Definition  13  (cont) 

cont  A(aj:nat,/>i:5r,/)2:*S'r,aret:T32). 

{ri:ns32,  r2:ns32,  fi:ns64,  f2:ns64,  re- Exnptr(/92),  rtiOret, 

sp:ns32"'  o  Pi  o  P2}  0 

The  translation  of  the  Code  type  using  these  auxiliary  definitions  is  given  in  definition  14  below. 
Code  function  types  take  three  arguments  corresponding  to  a  list  of  32  bit  arguments,  a  list  of  64 
bit  arguments,  and  a  return  type.  In  addition  the  new  code  sequence  is  polymorphic  over  two  stack 
variables:  pi  and  p2.  In  the  translation  of  a  function  type,  the  second  stack  variable,  p2,  classifies 
the  stack  tail  expected  by  the  current  exception  handler,  while  the  first  stack  variable,  p2,  classifies 
the  stack  between  the  arguments  and  the  last  handler.  Functions  expect  all  of  their  arguments  on 
the  stack,  and  the  exception  frame  in  register  The  top  most  item  on  the  stack  upon  entry  to 
the  function  is  the  address  of  the  code  to  which  the  function  should  return  (that  is,  the  function’s 
continuation).  Function  return  values  are  returned  in  register  r^. 

Definition  14 


I  Code  I  =  A(a32:T32list,a64:T64list,ar:T32).V[pi:5r,p2:5'r]. 

{ri:ns32,r2:ns32,fi:ns64,f2:ns64,re:  Exnptr(p2),  rt:fi'S32, 

sp:  cont(|Q;32  @32  a64  @64  e\){pi){p2){ar)  >32  (032  @32  (a64  @64  pi  o  P2))}  0 

The  construction  of  the  stack  type  in  definition  14  uses  defined  operators  @32  and  @64  which 
prepend  lists  of  32  bit  and  64  bit  types  (respectively)  to  stack  types.  It  is  straightforward  to  define 
these  as  object  level  recursors  within  the  type  system. 

Definition  15  (c  @32  u) 

c  @32  CF  =  /(c)(T  where 

f  :T32list  ST  ^  ST  '^=  A(a:T32list).A(p:5T).  pr(j,  a:l  +  T32  x  j,  p:{j  ST),  in 

case  (a) 
inj]^  *  cj 

inj2/3  ^  ns 32  >32  {pP)) 

Definition  16  (c  @64  cr) 

c  @64  cr  =  f{c)a  where 

f  :T64list  ^  ST  ST  A(a:T64list).A(/3:5T).  pr(j,  a:l  +  T32  x  j,  p:{j  ST),  in 

case  (a) 
inj]^  *=^cj 

inj2/3  ^  ns 32  >64  {pP)) 

The  definition  also  refers  to  the  length  function  for  stacks  |  •  IrS'T  ^  nat.  It  is  straightforward 
to  define  this  using  primitive  recursion  in  the  same  fashion  as  the  iterated  stack  type  and  the  list 
append  functions. 
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8.1.3  Soundness  of  the  type  translation 

Lemma  32  (Well-formedness  of  the  Code  definition) 

If  h  A  ok  then  A  h  Code  :  T32list  ^  Te4list  ^  T32  ^  T32. 

Proof:  By  construction,  h  Code  :  T32list  ^  Te4list  ^  T32  ^  T32.  The  result  then  follows  by 
repeated  weakening. 


Theorem  15  (Soundness  of  the  constructor  translation) 

For  a  LIL  constructor  c,  if  Ah  c:  k  then  Ah  |c|  :  k. 

Proof:  By  construction.  Every  derivation  H  of  A  h  c :  can  be  turned  into  a  derivation  of 

Ah  |c|  :  K  by  replacing  every  use  of  the  Code  axiom  with  a  derivation  for  Code  obtained  via  lemma 
32. 


Theorem  16  (The  translation  respects  substitution) 

|c|[|c'|/a]  =  |c[cVa]| 

Proof:  The  code  definition  (Code)  and  the  Code  primitive  are  both  closed,  so 
I  Code  |[|c^|/a]  =  Code[|c'|/Q;]  =  Code  =  |  Code  |  =  |  Code[c  /a\\ 

All  other  constructors  remain  unchanged  by  the  translation,  so  the  proof  follows  trivially. 


Theorem  17  (The  translation  respects  equivalence) 

If  Ah  c=  c' :  K  then  A  h  |c|  =  \c'\  :  n 

Proof:  By  construction.  Every  derivation  of  A  h  c  =  c' :  k  can  be  turned  into  a  derivation  of 
Ah  |c|  =  |c'|  :  K  by  replacing  every  use  of  the  reflexivity  axiom  on  Code  with  a  use  of  the  reflexivity 
axiom  on  Code. 


8.2  The  term  level:  preliminaries 

Most  of  the  work  of  the  translation  from  LIL  to  TILTAL  takes  place  at  the  level  of  terms.  The 
next  few  sections  will  introduce  some  of  the  key  concepts  used  in  the  translation. 

8.2.1  Register  and  stack  slot  allocation 

One  of  the  key  issues  in  translating  from  a  variable  binding/substitution  based  language  to  a  register 
transfer  style  language  is  how  to  efficiently  manage  a  fixed  set  of  registers.  This  is  the  problem 
of  register  allocation.  Eor  the  purposes  of  the  formal  translation,  I  choose  to  leave  the  specifics 
of  register  and  stack  slot  allocation  abstract.  The  problem  of  register  allocation  has  been  studied 
extensively,  and  this  dissertation  does  not  add  anything  substantially  new  to  the  discussion.  In 
practice,  standard  techniques  can  be  applied  to  determine  a  suitable  mapping  and  the  translation 
uses  this  information  abstractly.  This  has  the  advantage  of  essentially  making  the  translation 
parametric  over  the  choice  of  allocation  methods.  Nonetheless,  I  wish  to  be  able  to  show  that  the 


103 


translation  as  I  define  it  is  sound.  Consequently,  I  will  place  certain  typing  requirements  on  the 
allocation  method. 

The  term  level  translation  is  defined  with  respect  to  a  notion  of  an  abstract  allocator,  which  I 
generally  write  as  A.  An  allocator  is  an  object  that  is  responsible  for  mapping  program  variables 
(both  32  and  64  bit)  to  registers  and  stack  slots  of  the  appropriate  size.  In  order  to  avoid  relying 
on  any  particular  register  allocation  technology,  I  generally  remain  agnostic  as  to  the  internal  data 
structures  of  the  allocator,  defining  only  a  set  of  operations  which  any  allocator  must  support. 

Definition  17 

An  allocator  is  an  object  A  with  the  following  associated  operations: 

•  For  every  32  bit  variable  x,  A{x)  =  r  or  A{x)  =  sp(f). 

•  For  every  64  bit  variable  Xf,  A{xf)  =  f  or  A{xf)  =  sp(i). 

•  frmsz(A)  is  a  natural  number. 

•  For  every  LIL  typing  context  T  and  stack  type  a,  |r|(^  =  T^i  for  some  register  file  type  T^. 

The  most  basic  such  operation  is  to  give  a  location  for  a  variable.  For  a  32  bit  variable  x,  I 
write  the  location  associated  with  x  by  an  allocator  A  as  A(x).  Where  appropriate,  I  sometimes 
write  A[x  r]  when  A{x)  =  r  or  A[x  sp(i)]  when  A{x)  =  sp(i).  Similarly,  for  a  64  bit  variable 
X64,  I  write  the  location  associated  with  X64  by  the  allocator  A  as  A{xq4).  Where  appropriate,  I 
sometimes  write  A[x64^f]  when  A{xq4)  =  f  or  A.[x64^sp(z)]  when  A{x^4)  =  sp(f).  It  is  perfectly 
reasonable  (and  likely)  that  an  allocator  will  map  multiple  variables  to  the  same  registers  and  stack 
slots  ^ 

An  allocator  defines  and  manages  the  stack  frame  for  a  function.  Consequently,  I  require  that 
an  allocator  answer  queries  about  the  size  (in  32  bit  words)  of  the  current  frame:  written  frmsz(A). 

An  allocator  is  responsible  for  managing  the  stack  and  register  resources  used  to  implement  a 
LIL  program  in  the  TILTAL  abstract  machine.  At  various  points  in  the  program,  it  is  necessary 
to  write  out  the  type  of  the  abstract  machine.  Consequently,  the  second  operation  required  of  an 
allocator  is  to  be  able  to  generate  a  register  file  type  from  a  given  typing  context  T.  Note  though 
that  an  allocator  is  only  responsible  for  managing  the  current  frame  and  the  registers  -  it  should 
be  parametric  with  respect  to  the  rest  of  the  stack.  With  this  in  mind,  I  define  the  translation  of 
a  LIL  typing  context  T  under  an  allocator  A  with  respect  to  a  stack  tail  a  to  be  the  register  file 

r 

4- 

This  completely  defines  an  allocator:  an  object  A  which  assigns  locations  to  variables,  has  a 
defined  frame  size,  and  maps  every  context/stack  type  pair  to  a  register  file  type.  However,  not 
every  allocator  is  sufficient  for  the  purposes  of  the  translation,  since  there  are  many  incoherent 
allocators  which  satisfy  this  definition.  In  order  to  state  the  soundness  of  the  LIL  to  TILTAL 
translation,  I  define  the  notion  of  a  good  allocator  that  satisfies  certain  conditions. 

Definition  18  (Good  allocator  for  a  context) 

Let  T^  be  |r|(^,  where  T  is  a  context  and  a  is  a  stack  type,  and  let  rt,  ft,  and  re  be  designated 
machine  registers.  Then  I  say  that  an  allocator  A  is  a  good  allocator  for  T  if: 

^  While  for  semantic  correctness  it  is  to  be  hoped  that  this  will  only  occur  for  variables  whose  live  ranges  do  not 
overlap,  this  is  not  required  by  the  translation  so  long  as  the  typing  requirements  discussed  below  are  fulfilled. 
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1.  For  all  X,  A{x)  /  Ve  smd  A{x)  A  and  for  all  xq4,  A{x)  A  ft- 

2.  r/i(sp)  =  Of  o  a  and  frmsz(^)  =  \of\. 

3.  I  •  1^  =  {ri:ns32,r2:ns32,fi:ns64,f2:fiS64,re:fiS32,rt:^-S32,sp:ns32"^  o  cr}  where  n  =  frmsz(^). 

4.  IfT  =  ri,x:r, r2  then: 

(a)  A  is  a  good  allocator  for  ri,r2. 

(b)  If  A{x)  =  r  then  |r|^  =  [Fi, r2|^{r:|r|} 

(c)  IfA{x)  =  sp(i)  then  |r|^  =  |ri, r2|^{sp:(c7')W32  ^  |t|}  where  |ri,r2|^(sp)  =  o'. 

5.  IfT  =  Fi,  X64:</>,r2  then: 

(a)  A  is  a  good  allocator  for  ri,r2. 

(b)  ifAixeA  =  f,  |r|;^  =  |ri,r2|^{f:|</>|} 

(c)  IfA{x)  =  sp(i)  then  |r|^  =  [Fi, r2|^{sp:((T')[i]64  ^  \4>\}  where  |Fi,F2|^(sp)  =  o'. 

The  first  three  requirements  ensure  that  the  allocator  doesn’t  interfere  with  registers  and  stack 
slots  outside  of  its  control,  and  that  all  unused  registers  and  stack  slots  are  given  the  nonsense  type 
by  the  translation. 

The  special  registers  r^,  ft  fe  are  not  available  for  general  allocation.  The  first  two  are  reserved 
for  temporary  computation  and  returned  values,  and  the  last  for  exception  frames. 

The  second  requirement  of  a  good  allocator  is  that  the  stack  type  in  the  machine  state  returned 
from  the  translation  of  a  context  must  be  an  extension  of  the  stack  provided,  and  that  the  size  of 
the  extension  must  match  the  result  of  the  frmsz()  query. 

The  third  requirement  of  a  good  allocator  is  that  it  not  impose  extraneous  requirements  on  the 
state:  that  is,  that  the  translation  of  an  empty  typing  context  is  empty. 

The  important  requirement  to  be  a  good  allocator  is  that  the  locations  returned  from  variable 
location  queries  must  be  coherent  with  the  machine  state  obtained  by  translating  a  context  con¬ 
taining  the  query  variables.  This  constraint  is  expressed  in  the  last  two  good  allocator  constraints. 

There  are  two  properties  of  interest  that  follow  from  these  constraints.  The  first  can  be  sum¬ 
marized  informally  as  follows:  if  F(x)  =  r  then  |F|^(^(x))  =  |r|:  that  is,  if  the  allocator  assigns  a 
variable  from  a  typing  context  to  a  location,  then  the  type  assigned  to  that  location  in  the  trans¬ 
lation  of  the  context  must  be  the  translation  of  the  variable  type.  More  precisely,  if  F^i  =  |F|^ 
then: 

1.  If  for  every  x  such  that  F  =  Fi,x:r,F2: 

(a)  If  ^(x)  =  r,  then  FA(r)  =  |r|. 

(b)  If  ^(x)  =  sp(i)  and  F^(sp)  =  o'  then  o'[i]z2  =  |'r|- 

2.  If  for  every  x  such  that  F  =  Fi,X64:(/>,F2: 

(a)  If  ^(x64)  =  f,  then  FA(f)  =  |<^|. 

(b)  If  ^(x64)  =  sp(i)  and  Fa(sp)  =  o'  then  cT'[i] 64  = 
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The  second  property  that  follows  from  the  definition  is  that  the  extension  of  a  typing  context 
by  a  single  variable  translates  to  the  same  machine  state  as  the  translation  of  the  original  context 
with  the  type  of  the  location  for  the  new  variable  updated  with  the  appropriate  type.  Informally, 
|r,x:r|^  =  |r|^{^(x):|r|}.  More  precisely: 

1.  If  r  =  r',  x\T  then: 

(a)  If  A{x)  =  r  then  |r|^  =  |r'|^{r:|r|} 

(b)  If  A{x)  =  sp(i)  then  |r|^  =  |r'|^{sp:(cj')[i]32  ^  |r|}  where  |r'|^(sp)  =  a'. 

2.  If  r  =  r',  X64  then: 

(a)  If^(x64)  =  f,  \T\^^  =  \T'\^^{f:\cj)\} 

(b)  If  A{x)  =  sp(i)  then  |r|^  =  |r'|^{sp:(cj')[i]64  <—  |</>|}  where  |r'|^(sp)  =  a' . 

A  third  (indirect)  consequence  of  the  good  allocator  definition  is  that  the  translation  of  contexts 
commutes  with  substitution. 

Lemma  33  (Context  translation  substitution) 

If  A  is  a  good  allocator  for  T  then 


irisikl/a]  =  |r[c/a|| 

Proof:  (By  induction  on  T). 

1.  If  r  =  •  then 

|r|^[|c|/a]  = 

{ri:ns32,  r2:ns32,  fi:ns64,  h-nsei,  re:ns32,  rt:ns32,  sp:ns32((  o  fT}[|c|/a]  = 
{ri:ns32,  r2:ns32,  fi:ns64,  f2-ns64,  re:ns32,  rt:ns32,  sp:ns32”  o  fT[|c|/a]}  = 

2.  If  r  =  r',  x:t  then  by  the  good  allocator  assumption  (using  informal  notation) 

|r|(^[|c|/Q;]  =  (by  definition) 

(|r'|(^{^(x):|r|})[|c|/Q;]  =  (by  the  good  allocator  assumption) 
(ir'i(^[|c|/a]){^(x):|r|[|c|/a]}  =  (by  definition) 
(|r'|)4[|c|/a]){^(x):|r[c/a]|}  =  (by  lemma  16) 

“^{^(x):|r[c/a]|}  =  (by  induction) 


Being  a  good  allocator  is  a  very  weak  requirement  on  allocators.  In  particular,  it  does  not 
guarantee  that  the  choice  of  locations  is  semantically  well-behaved.  For  example,  the  allocator  which 
assigns  every  variable  of  a  given  type  to  the  same  location  (regardless  of  liveness)  is  a  perfectly  good 
allocator  by  this  definition:  the  fact  that  such  an  allocator  is  a  poor  choice  practically  is  irrelevant 
from  a  type-soundness  standpoint. 

The  notion  of  a  good  allocator  for  a  context  T  extends  naturally  to  a  notion  of  a  good  allocator 
for  an  expression. 
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Definition  19  (Good  allocator  for  an  expression) 

If  D  is  a  derivation  of e:r  then  I  say  that  an  allocator  A  is  a  good  allocator  for  e  if  for 
all  sub-derivations  of  D  of  the  form  A';  T'  \-  e' t'  (for  some  A',  e' ,  t'),  A  is  a  good  allocator 
forT'. 

Note  that  this  implies  that  a  good  allocator  for  e  is  a  good  allocator  for  all  sub-expressions  of  e. 
Also  note  that  this  implies  that  ^  is  a  good  allocator  for  F. 

Since  expressions  also  occur  as  sub-derivations  of  operations  (in  case  expressions  and  handlers) 
it  is  necessary  to  define  an  analogous  property  for  operations. 

Definition  20  (Good  allocator  for  an  operation) 

If  D  is  a  derivation  of  'F;  A;  F;  ^  h  opr  :  r  oprgj  then  I  say  that  an  allocator  A  is  a  good  allocator 
for  opr  if  for  all  sub-derivations  of  D  of  the  form  'F;  A^;F'  h  e'  :t'  (for  some  A',  F',  e',  r'),  A  is  a 
good  allocator  for  F'. 

Note  that  this  implies  that  a  good  allocator  for  an  expression  e  is  a  good  allocator  for  all  the 
operations  in  e,  and  that  a  good  allocator  for  an  operation  opr  is  a  good  allocator  for  all  expressions 
in  opr. 

8.2.2  Derived  instructions 

It  is  convenient  in  the  course  of  the  translation  to  make  use  of  some  derived  instructions  not 
provided  in  the  core  instruction  set  but  which  can  be  defined  as  sequence  of  core  instructions.  In 
a  complete  implementation  these  instructions  might  be  added  as  primitive. 

The  first  defined  instructions  coerce  registers  and  stack  slots  to  the  nonsense  type. 

Definition  21  (Junk  instructions) 

junk  r 
fjunk  / 

sjunk  sp(f)  swrite  sp(i),ns32 

fsjunk  sp(f)  fswrite  sp(f),ns64 

For  brevity,  a  single  instruction  is  defined  to  coerce  all  registers  to  the  nonsense  type. 

Definition  22  (Register  coercion) 


def 

=  mov  r,  nss2 

def  . 

=  fmov  j,  ns64 


junkregs 


def  .  T 

=  junk 
junk 
junk 
junk 


ri; 


r2; 

fi; 

f2; 


Lemma  34  (Derived  instructions) 

For  well-formed  contexts  'F,  A,  and  F,  where  F(sp)  =  a 

•  'F;  A;  F  Fjunk  r^F{r:ns32} 
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•  'I';  A;  r  h  Qunk  f  ^  r{f:ns64} 

•  A;r  h  sjunk  sp(i)  ^  r{sp:(cj)[i]32  ^  ns32} 

•  'k;  A;r  h  fsjunk  sp(i)  ^  r{sp:(cj)[i]64  <—  nsQ4} 

•  A;r  h  junkregs  r  ^  r{ri:ns32}{r2:ns32}{fi:ns64}{f2:ns64} 

Proof:  By  construction. 

■ 

Finally,  I  give  an  inductive  definition  of  an  instruction  to  coerce  a  range  of  stack  slots  to  the 
nonsense  type. 

Definition  23  (Stack  range  coercion) 

junkstack  n  . .  .n  sjunk  sp(n); 

e 

junkstack  n  . .  .m  (n  <  m)  sjunk  sp(m); 

junkstack  n  . . .  {m  —  1) 


Lemma  35  (Stack  range  coercion) 

For  natural  numbers  n  and  m  where  n  <  m  and  for  well-formed  contexts  dt,  A,  and  F,  where 
F(sp)  =  a: 

'k;  A;  F  h  junkstack  n . . .  m  ^  F{sp:(T'} 
where  a'  =  (((fT)[n]32  ^  ns32) . .  .)[m]32  ^  ns32 

Proof:  By  induction  on  m.  If  n  =  m,  then  the  derived  instruction  sequence  is  sjunk  sp(m):e,  and 
the  result  follows  directly  by  lemma  34.  If  n  <  m,  then  by  induction,  we  get  an  instruction  sequence 
which  junks  sp(n), . . .  ,sp(m  —  1).  After  appending  on  the  additional  sjunk  sp(m)  instruction, 
the  result  again  follows  by  lemma  34. 


In  addition  to  these  composite  instructions,  I  also  define  a  more  general  form  of  the  move 
instruction  which  targets  either  registers  or  stack  slots. 

Definition  24  (Arbitrary  mov  and  fmov) 


srmov  r,  sv 
srmov  sp(f),s?; 
srfmov  f,fv 
srfmov  sp(i),fv 


cM 

cM 

cM 

def 


mov  r,  sv 
swrite  sp(i),s?; 
mov  f,fv 

fswrite  sp(i),fv 


Lemma  36  (Arbitrary  mov) 

For  a  location  dest32,  and  for  an  operand  sv  such  that  'k;  A;  F  h  s?; :  r 

•  If  dest32  =  r  then  \k;  A;F  h  srmov  r,sv  ^  F{r:r} 

•  If  dest32  =  sp(z)  and  F(sp)  =  a  then  'k;  A;  F  h  srmov  sp(z),  sv  =>  F{sp:(cr)[f]32  ^  r} 
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Proof:  By  construction. 


Lemma  37  (Arbitrary  fmov) 

For  a  location  destss,  and  for  an  operand  fv  such  that  'F]  A]T  \-  fv:  4> 

•  If  dest32  =  f  then  'I';  A;r  h  fsrmov  f,fv  ^  r{f:(/)} 

•  If  dest32  =  sp(z)  and  r(sp)  =  a  then  A;r  h  fsrmov  sp{i),fv  ^  r{sp:((T)[i]64  <—  </>} 
Proof:  By  construction. 

■ 

To  simplify  the  translation  (as  well  as  eliminate  un- necessary  moves),  I  define  register  to  register 
mov  instructions  which  coerce  the  source  register  to  ns32  after  the  move  whenever  the  registers  are 
not  aliases. 

Definition  25  (Move  and  junk) 


movj  r,  r 
movj  r,  r'  (r  /  r') 

fmovj  f ,  f 
fmovj  f ,  {'  (f  /  f') 


def 

=  e 

def  / 

=  mov  r,  r  ; 
junk  r' 

def 

=  e 


def 


fmov  f,f^; 

fjunk  f 


Lemma  38  (32  bit  move  and  junk) 

For  registers  r  and  r'  'k;  A;  T  h  movj  r,  r'  ^  r{r':ns32}{r:r(r')} 

Proof:  By  construction. 

If  the  registers  are  aliases,  then  the  instruction  sequence  is  empty,  and 

r{r':ns32}{r:r(r')}  =  r{r:ns32}{r:r(r)}  =  T 

If  the  registers  are  different,  then  mov  and  junk  instructions  are  emitted,  which  update  the 
register  file  type  accordingly. 


Lemma  39  (64  bit  move  and  junk) 

For  registers  f  and  f'  'k;  A;  T  h  fmovj  f,  f'  ^  Tj  r':ns64}{f:r(f')} 

Proof:  By  construction.  If  the  registers  are  aliases,  then  the  instruction  sequence  is  empty, 

and  r  =  r{f':ns64}{f:r(f')}.  If  the  registers  are  different,  then  fmov  and  fjunk  instructions  are 
emitted,  which  update  the  register  file  type  accordingly. 

■ 

For  exception  handlers,  I  define  a  code  sequence  for  copying  a  stack  segment.  I  begin  by  defining 
a  code  sequence  stackcopy  (cti,(T2)  which  copies  the  portion  of  the  stack  described  by  1T2  intro 
pre-allocated  space  below  the  already  copied  ai.  Note  that  stackcopy  (cji,cj2)  is  only  defined 
when  cji  and  CJ2  have  well-defined  sizes:  that  is,  when  they  are  composed  solely  of  pushes  and  of 
compositions  of  stacks  with  well-defined  sizes. 
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stackcopy  {a,  e)  =  e 

stackcopy  (cri,  r  >32  0-2)  swrite  sp(|(Ti|),  sp(|cji  o  r  >32  (T2I  +  |cri|); 

stackcopy  {ai  o  (r  >32  e),  0-2) 

stackcopy  ((Ti,(/)  >64  0-2)  =  fswrite  sp(|cri|),  sp(|(Ti  o  (/>  >64  <721  +  |o-i|); 

stackcopy  {ai  o  {(f)  >54  e),  (T2) 


Lemma  40  (stackcopy) 

For  well-formed  stack  types  ai,  cj2  and  a  such  that  |(Ti|  and  \a2\  are  well-defined: 

'k;  A;  r{sp:(cJi  o  ns32^'^^^)  o  {ai  o  <72)  o  cj}  h  stackcopy((Ti,  (T2)  ^  r{sp:((Ti  o  <72)  o  {ai  o  <72)  o  <7} 
Proof:  By  induction  on  <72. 

•  If  CJ2  is  empty,  then  |c72|  =  0,  so 

(cJl  O  nS32^^^^)  O  ((7l  O  172)  O  CJ  =  (7l  O  CJl  O  £7 


•  If  CJ2  =  T  >32  (y'2  then: 

Let  CJ4  =  CJl  o  (r  >32  e) 

By  induction: 

4';  A;r{sp:(cj'4  o  ns32l°'2l)  o  (cj'^  o  a'2)  o  cj}  h  stackcopy  (cj},  cj^  ^ 

r{sp:(cj}  o  cj})  o  (cj}  o  cj})  o  cj} 

Note  that  cj}  o  cj}  =  cjj  o  (r  >32  e)  o  cj}  =  cjj  o  r  >32  cj}  =  cJi  o  cJ2 

So  by  the  partial  sequence  instruction  rule,  it  suffices  to  show  that: 

'k;  A;  r{sp:(cJi  o  71332^"^^^)  o  {o'!  o  0-2)  o  cj}  h  swrite  sp(|cJi|),  sp(|cJi  o  r  >32  it}|  +  |<ti|)  ^ 

r{sp:(cJi  o  (r  >32  e)  o  ns32l'^2l)  o  (cJi  o  CJ2)  o  cj} 

But  note  that  |cji  o  ns32^^'^^  o  cji|  =  |cji  o  r  >32  <t}|  +  |iti|,  and  hence 

((cJi  O  71532'°'^')  o  (cJi  O  CJ2)  o  Cj)[|cJi  O  r  >32  <t}|  +  |(Ti|]32  =  CJ2[0]32  =  T 

And  since  ns32^^^^  =  =  ^332  i>32  77532^^^21 

(((cJi  O  775321'^^!)  O  (cJi  O  CJ2)  o  Cj))[|cJi|]32  ^  T  =  (cJi  o  (r  >32  c)  O  n332^'^'^^)  O  (cJi  O  CJ2)  o  CJ 
Which  is  what  we  wanted. 

•  If  CJ2  =  <^  >64  <7}  then  the  result  follows  by  a  similar  argument. 

■ 

The  stackcopy  definition  is  used  to  define  a  stack  duplication  code  sequence  copyframe  a 
which  emits  code  to  duplicate  a  on  the  top  of  the  stack. 

copyframe  CJ  salloc  |cj|; 

stackcopy  (e,  cj) 


no 


64-bit  values 

4';  A;  T;  ^,£71, 0-2  fv:  <j)  fA 

32-bit  values 

47;  A;  T;  cji ,  £72  P32  sv:t  ^  sv' 

64  bit  operations 

47;  A;  T;  £7i ,  £72  h  dest64  ^  fopr 

32  bit  operations 

47;  A;  T;  £7i ,  £72  h  dest32  <—  opr  :t  ^  S 

Expressions 

47;  A;r;^,  £7i,£72  Pc  /;F 

Heap  values 

47  h/i  hval  :t  ^  I]  F 

Heaps 

d'^  H 

Programs 

\-  p:T  P 

Figure  8.2:  LIL  to  TILTAL  translation  judgements 

Lemma  41  (copy frame) 

For  well-formed  stack  types  a  and  a' ,  such  that  |cj|  is  well-defined: 

4';  A;  r{sp:cj  o  a'}  h  copyframe  a  ^  r{sp:iT  o  cj  o  a'^ 

Proof:  By  construction. 

By  the  salloc  rule,  it  suffices  to  show  that 
'h;  A;  r{sp:ns32l‘^l  o  cr  o  a'}  h  stackcopy(e,  a)  ^  r{sp:iT  o  a  o  a'} 

Which  follows  immediately  by  lemma  40. 


8.3  The  term  translation 

The  judgement  forms  used  for  the  translation  of  LIL  programs  to  TILTAL  programs  are  listed  in 
figure  8.3.  In  addition  to  the  usual  LIL  typing  contexts,  all  of  the  expression  level  judgements  are 
defined  with  respect  to  an  allocator  A  and  two  stack  types  and  a2-  The  allocator  provides  the 
locations  of  variables  in  registers  and  frame  slots,  and  the  two  stack  types  keep  track  of  the  layout 
of  the  rest  of  the  stack  below  the  current  frame:  describes  the  stack  above  the  current  handler 

and  (T2  describes  the  stack  below  it. 

Within  the  bodies  of  functions,  these  stack  types  will  generally  refer  to  two  additional  free 
variables  representing  the  two  stack  segments  expected  by  the  function  (the  segment  above  the 
enclosing  handler,  and  the  segment  below  it).  By  convention,  I  name  these  variable  pi  and  p2,  and 
the  stack  types  are  expected  to  be  well-formed  in  a  context  including  these  variables  in  addition 
to  free  type  variables  from  the  original  program.  So  for  example,  an  invariant  of  the  most  of  the 
translation  judgements  is  that  \A\,  pi:ST,  p2:ST  h  ai:ST,  and  similarly  for  cj2,  where  A  is  the 
current  constructor  context.  For  brevity,  I  will  generally  abbreviate  this  idiom  as  follows: 

Definition  26 


^Pi,P2  \A\,pi:ST,p2:ST 


The  next  several  sections  will  give  detailed  overviews  of  the  individual  translation  judgements 
and  discuss  some  of  the  more  interesting  translation  rules,  as  well  as  stating  and  proving  the  relevant 
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soundness  properties.  The  complete  definition  of  the  translation  can  be  found  at  the  end  of  the 
chapter. 

8.3.1  Values 

LIL  32  bit  and  64  bit  values  translate  into  TILTAL  32  bit  and  64  bit  operands,  respectively.  The 
32  bit  value  translation  judgement  'h;  A;  T;  A,  cii,  cj2  bga  sv  :t  sv'  indicates  that  in  heap  context 
'h,  constructor  context  A,  term  context  T,  allocator  A,  and  stack  segments  ui  and  <72;  a  LIL  small 
value  sv  of  type  r  translates  to  a  TILTAL  operand  sv' .  The  64  bit  has  an  analogous  interpretation. 

32  bit  values 

For  closed  values,  the  translation  does  little.  For  example,  the  translation  of  an  integer  value: 


T;  A;F;  A,  ui,  cr2  F32  i :  Int  ^  i 

Similarly,  for  coerced  values  such  as  values  injected  into  union  types,  the  translation  simply  induc¬ 
tively  translates  the  coerced  value  to  an  operand,  and  then  applies  the  translated  coercion. 

The  rules  for  translating  variables  are  more  interesting  however,  since  they  use  the  allocator  to 
choose  locations  for  the  variable. 


'I';A;F[x:r];A[x^r],cri,cr2  I-32  x  :  r r 


4';A;F[x:r];  A[x^sp(i)],cj2,cJ2  I-32  x  :  r sp(i) 

The  translation  of  32  bit  LIL  values  to  TILTAL  operands  is  sound  in  the  sense  that  given  a 
well-behaved  allocator  (as  defined  in  definition  18),  a  well-typed  LIL  value  translates  to  a  TILTAL 
operand  that  is  well-typed  in  the  translation  of  the  term  context  under  the  allocator.  More  precisely: 

Theorem  18  (Soundness  of  the  small  value  translation) 

If  A;  F  h  sx  :  T 

and  A  is  a  good  allocator  for  F 

and  hai:  ST 

and  AP^'P^  h  £72  :  5r 

and  T;  A;  F;  A,  £7i,  £72  I-32  sr’ :  T sx' 

then 

Proof:  (theorem  18)  By  induction  on  sv.  The  proof  proceeds  by  cases. 

1.  Suppose  'k;  A;  F[x:r]  h  x  :  r  and  'k;  A;  F[x:r];  A[x  ^  r],  (7i,  £J2  I-32  x  :  r  r. 

To  show: 

hr:|r| 

By  assumption: 

F  =  Fi,x:t,F2 
A(x)  =  r 
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So  by  the  good  allocator  assumption: 

|r|-°-(^(x))  =  |r| 

So  by  the  register  rule: 

hr:|T| 

2.  Suppose  'h;  A;  r[x:r]  h  x  :  r  and  'h;  A;  r[x:r];  ^[x  ^  sp(i)],  <72,  cj2  bga  x  :  r  sp(^)- 
To  show: 

l^l;A/’i-p2;  hsp(i):|T| 

By  assumption: 

r  =  ri,x:r,r2 

^(x)  =  sp(i) 

So  by  the  good  allocator  assumption: 

|rir"^(sp)  =  u 
cr[ih2  =  \t\ 

So  by  the  stack  slot  rule: 

|r|;^i°'"2  hsp(i):|r| 

3.  Suppose  'k[£:r];  A; T;  cji,  <72  bga  £:t  i. 

To  show: 

By  definition,  |'I'[t':r]|  =  |'k|,£:|r|  so  the  result  follows  directly  by  the  label  rule. 

4.  Suppose  'k;  A;  T;  (7i ,  £72  bga  i :  Int  i. 

To  show: 

|^|;A/’1-P2;  h  f:Int 

This  follows  directly  by  the  integer  rule. 

5.  Suppose  'k;  A;  T;  C7i,  £72  I-32  injmnion^  sv  :  C'^  injmnion^j  1^1^  sv' . 

To  show: 

|T|;A^i’/’2.|r|^i°<^2  p  inj_uiiion(^^l^l)Sx':|c| 

By  inversion: 

A  h  c  =  V[-  •  • )  Ci, . . .]  :  T32 
T;  A;r;^,  £7i,£J2  I-32  sv.Ci'^  sv' 

T;  A;  T  h  sx  :  Ci 
By  induction: 

|^|;A/’1-P2.  |r|;^i°'"2  h  sv':\ci\ 

By  theorem  17  and  weakening: 

h  |c|  =  |V[...,Ci,...]|:T32 

By  construction  (injunion): 

^pi,p2  I-  injmnion^jj^l)  :  |ci|  ^  |c| 

By  construction  (coercion  app): 

|^|;A/’i-P2.  |r|;^i°'"2  g  inj_uiiion(^^l^l)Sx':|c| 


113 


6.  Suppose  A;  F;  cji,  CJ2  Fga  toIIt-  sv  :t  rollj^l  sv' . 

To  show: 

1^1;  A/’i-Pa;  h  roll|^|  sv' :  |r| 

By  inversion: 

A  h  T  =  Rec[K] (c)(cp)  :T32 

'h;  A;  F;  fji,  cj2  F32  sv  :  c(Rec[«:]c)cp  sv' 

'F;  A;  F  h  St! :  c(Rec[K]c)cp 

By  induction: 

1^1;  A/’i-Pa;  h  sv'  :\c{Rec[K]c)cp\ 

By  theorem  17  and  weakening: 

APi-Pi  h  |r|  =  I  Rec[«:](c)(cp)|  :T32 

Note  that  |  Rec[K](c)(cp)|  =  Rec[K](|c|)(|cp|)  and  |c(Rec[K]c)cp|  =  (|c|(Rec 

By  construction  (roll): 

APi,P2  I-  roll|^l  :  (|c|(Rec[K]|c|)|cp|)  ^  |r| 

By  construction  (coercion  app): 

1^1;  A/’i-pa;  h  roll|^|  sv' :  |r| 

7.  Suppose  'F;  A;  F;  cji,  CJ2  F32  unroll,-  sv  :  c(Rec[K]c)cp  ^  unroll|,-|  sv' . 
To  show: 

|\F|;  |F|(^^°‘^^  h  unroll|,-|  sv' :  |c(Rec[K]c)cp| 

By  inversion: 

A  h  T  =  Rec[K](c)(cp)  :T32 
'F;  A;F;^,  (Ti,(T2  F32  sv.t^  sv' 

'F;  A;  F  h  St! :  r 

By  induction: 

l^ljA/’i-pa;  |F|(^i°'"2  h  sv':\t\ 

By  theorem  17  and  weakening: 

API, PI  p  |p|  =  I  Rec[K](c)(cp)|  :T32 

Note  that  |  Rec[K](c)(cp)|  =  Rec[K](|c|)(|cp|)  and  |c(Rec[K]c)cp|  =  (|c|(Rec 

By  construction  (unroll): 

A/’i,P2  |-  unroll|,-|  :  |t|  ^  (|c|(Rec[/t]|c|)|cp|) 

By  construction  (coercion  app): 

|\F|j  h  unroll|,-|  sv' :  |c(Rec[K]c)cp| 

8.  Suppose  'F;  A;  F;  cji ,  CJ2  F32  pack  sv  as  r  hiding  c:t  ^ 

(pack[|r|]  |c|)s?;' 

To  show: 

1^1;  A/’i-/’2;  |F|(^1°°'2  I-  (pack[|r|]|c|)s?;' :  |r| 

By  inversion: 

AhT  =  3M(c'):T32 

'F;  A;  F;  fji ,  CJ2  F32  sv.c'c^  sv' 

'F;  A;  F  h  SI! :  c'c 
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By  induction: 

h  sv'-.\c'c\ 

By  theorem  17  and  weakening: 

AP^’P^h\T\  =  \3[K]{c')\:Ts2 

Note  that  |3[k;]c'|  =  3[k]|c'|  and  |c  c'\  =  |c|  \c'\. 

By  construction  (pack): 

/^pi,p2  |-  pack[|r|]|c|  :  |c^||c|  ^  |r| 

By  construction  (coercion  app): 

|\j/|j  /^pi,p2.^  h  pack[|r|] |c|s?;' :  |r| 

9.  Suppose  'k;  A;  B;  cji,  CJ2  hga  s?;[c]  :  c'c  ^  si’^[|c|] 
To  show: 

|^|;A/’1-P2;  h  s?;[c]:|c'c| 

By  inversion: 

'k;  A;  T;  fji ,  (72  hga  sv.c'c^  sv' 

'k;  A;  r  h  St! :  V[k](c') 

A  h  c :  K 

By  induction: 

|d/|;A^i-^2;|r|^^°"^hs^':|V[K](c')| 

By  theoremlS  and  weakening: 

I^P1,P2  p  |c|  ;  K 

Note  that  |V[k]c'|  =  V[k]|c'|  and  |c  c'|  =  |c|  \c'\. 

By  construction  (forall  instantiation): 
|^|;A/’i-P2;|r|(^i°'"2hs?;'[|c|]:|c'c| 


64  bit  values 

The  translation  rules  for  64  bit  values  is  exactly  analogous  to  that  of  32  bit  values.  For  variables: 


'^;A;T[xf:4>];A[xf  ^  f], ui,  (72  l-e4  x/ :  f 


4^;  A;r[x/:(()];^[x/ ^  sp(i)],cr2,fT2  F32  x/ :  sp(i) 

And  for  constants: 

47;  A;  F;  (T2,  (72  F32  r:  Float  ^  r 

The  soundness  theorem  is  stated  and  proved  in  much  the  same  fashion  as  for  32  bit  values 
except  that  no  induction  is  required. 
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Theorem  19  (Soundness  of  the  64  bit  value  translation) 

If  4';  A;  r  h  /?; :  (/) 

and  A  is  a  good  allocator  for  T 
and  hai:  ST 

and  AP^'P2  h  cj2  :  ST 
and  'I>]A]T]A,ai,a2\-32fv:4’"-^fv' 

then 

Proof:  By  construction.  The  proof  proceeds  by  cases  on  the  last  rule  of  the  translation  derivation. 


1.  Suppose  'h;  A;  T[xf:(p]]A[xf  f],  ui,  cj2  \-q4,  x/  :  (/>  ^  f 
To  show: 

By  assumption: 
r  =  Ti,Xf:(f>,T2 
A{xf)  =  f 

So  by  the  good  allocator  assumption: 

\Tr^°-^A{xf))  =  !</>! 

So  by  the  float  register  rule: 
l^ljA/’i-Pa;  b  f  :  |(/>| 

2.  Suppose  'h;  A;  T[x f.cj)];  A[x f  sp(i)],  CJ2,  (72  ^ 32  x  f.  cj)  sp(i). 

To  show: 

l^ljA/’i-Pa;  hsp(i):|(/.| 

By  assumption: 

r  =  ri,x:T,r2 

A{x)  =  sp(i) 

So  by  the  good  allocator  assumption: 

TO°-(sp)  =  u 
cr[i]64  =  |</>| 

So  by  the  stack  slot  rule: 

3.  Suppose  4';  A;  B;  CJ2,  £12  bga  r :  Float  ^  r. 

To  show: 

\'I>[i:T]\;AP^'P^;\TQ°^^  h  r:Float 
This  follows  immediately  by  construction. 
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8.3.2  Operations 

The  translation  judgement  for  LIL  operations  takes  the  form  'I';  A;  T;  cji ,  <72  h  dest32  <—  opr  :  r 

S.  Whereas  the  value  translation  relates  LIL  values  and  TILTAL  operands,  the  operation  trans¬ 
lation  relates  a  LIL  operation  opr  to  both  a  TILTAL  location  dest32  and  a  partial  instruction 
sequence  S  (as  defined  in  section  7.2.1).  The  idea  behind  this  relation  is  that  given  the  machine 
state  assumptions  encoded  in  the  allocator  (A),  the  partial  instruction  sequence  S  implements  the 
operation  opr,  leaving  the  result  in  the  destination  dest32- 

Structuring  the  translation  in  this  manner  is  intended  to  improve  the  quality  of  the  generated 
code  by  (among  other  things)  eliminating  un-necessary  mov  instructions.  The  TILT  certifying 
implementation  discussed  in  subsequent  chapters  uses  this  idea  in  a  very  similar  form  to  that  given 
here  in  the  formal  translation.  A  closely  related  approach  to  code  generation  is  also  taken  by 
Dybvig  et  al.  in  their  paper  “Destination-Driven  Code  Generation”  [DHB90]. 

Not  all  LIL  operations  are  given  translations  by  the  operation  translation.  It  simplifies  the 
structure  of  the  translation  noticeably  to  translate  only  certain  instructions  (raise,  handle,  and  the 
various  case  operations)  as  part  of  an  expression.  Consequently,  these  operations  are  treated  as 
part  of  the  expression  translation. 

In  general,  slightly  different  code  must  be  produced  when  the  destination  is  a  stack  slot  versus 
a  register.  In  order  to  keep  the  translation  simple,  I  handle  all  stack  slot  destinations  with  a  single 
rule. 

'k;  A;  T;  A,  cJi,  CJ2  b  ft  ^  opr  :t'^S 


'k;  A;  T;  A,  fJi,  (72  b  sp(i)  ^  opr  :t  S] 

swrite  sp(i),rt; 
junk  rt 

This  rule  simply  translates  the  operation  using  the  temporary  register  as  the  destination,  then 
writes  the  value  of  the  temporary  register  to  the  appropriate  stack  slot  and  clears  the  temporary 
register.  In  some  cases,  better  code  could  be  produced  by  adding  additional  rules  for  specific 
operation/destination  pairs  in  which  the  intermediate  usage  of  the  temporary  register  could  be 
eliminated. 

For  the  inclusion  of  small  values  into  the  operation  level,  the  translation  produces  an  operand 
from  the  small  value  and  moves  it  into  the  destination  register. 

'k;  A;  T;  A,  ai,a2  bgj  sv  :t  sv' 


'k;  A;  T;  A,  i7i,  (72  b  r  St! :  r  movr,  sv' 

Note  that  in  many  cases,  a  good  allocator  will  have  assigned  r  as  the  location  for  sv  (for  example, 
when  sv  is  a  variable  which  is  not  live  after  the  operation).  In  such  cases,  un-necessary  move 
instructions  will  be  generated.  As  above,  additional  translation  rules  can  be  added  to  produce 
better  code  for  such  cases. 

A  more  interesting  rule  to  consider  is  the  translation  rule  for  the  call  operation,  which  invokes 
a  code  function  on  a  list  of  64  and  32  bit  arguments.  Code  calls  are  implemented  by  writing  the 
code  arguments  onto  the  stack  and  then  calling  the  translated  code  sequence.  Upon  return,  it  is 
necessary  to  move  the  result  from  the  temporary  register  to  the  destination  register  (since  the 
calling  convention  specifies  that  function  results  are  returned  in  r^).  Note  the  use  of  the  movj 
pseudo-instruction  to  ensure  correctness  in  the  case  that  the  destination  register  is  r^. 
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|r|^  =  {ri:ns32,r2:ns32,re:ns32,rt:ns32,fi:ns64,f2:ns64,sp  :  fX/} 

/\P\,P2  (Jr  =  nS32  >32  •  •  •  >32  nS32  >32  ^564  >64  '  '  '  nSe4  >64Cx'  :  ST 

^ - V - ^  ^ - V - ^ 

m  k 

A;r;^,fTi,fT2  I-32  sv:  Code(ro, . . . ,  rm-i)(</>o,  •  •  • ,  ^  t sv' 

'i/;  A;T;  A,  ai,a2\- 32  svi'.Ti^  sv[  i  e  0  . .  .m  -  1 
A;r;^,  cJi,o-2  I-32  /Uj :  0*  /y  •  i  e0...k-l 

'if-,A-,T]A,ai,a2  h  r  ^  call  sv{svo,  •  •  • ,  sVm-i){fvo,  ■  ■  ■  Jvk_i) 

fswrite  sp(m  +  2*  {k  — 

fswrite  sp(m),/?;g 
swrite  sp{m  —  1),  sv'^_i 

swrite  sp(0),s?;Q 

junkregs 

call  sv'[a'  o  ai,  <72] 

movj  r,  rt 

This  rule  illustrates  several  important  ideas  used  in  the  translation.  Firstly,  note  that  the 
translation  requires  that  the  translation  of  the  typing  context  under  the  allocator  be  a  register  file 
in  which  all  of  the  registers  contain  only  junk.  This  reflects  the  fact  that  all  registers  in  this  calling 
convention  are  caller-save,  and  hence  may  be  overwritten  during  the  function  call.  A  translation 
using  a  callee-save  convention  could  allow  some  or  all  of  the  registers  to  be  occupied.  The  implicit 
effect  of  this  rule  is  to  force  any  valid  translation  to  spill  registers  using  the  spill  rule  before  applying 
a  call  rule  (or  to  never  map  registers  at  all).  This  demonstrates  a  key  advantage  of  defining  the 
translation  parametrically  with  respect  to  allocation:  there  is  a  clean  separation  of  concerns  between 
the  expectations  of  the  translator  and  the  mechanism  by  which  the  register  allocator  satisfies  those 
expectations. 

Secondly,  note  that  the  translation  expects  the  allocator  to  include  space  for  the  out-arguments 
in  the  frame.  This  is  not  required  by  the  allocator  methodology,  but  is  convenient  for  the  purposes 
of  avoiding  a  frame  pointer,  as  well  as  permitting  the  allocator  to  potentially  re-use  temporary 
slots  as  out-arguments.  The  translation  expresses  this  requirement  on  the  allocator  by  premising 
the  translation  of  a  call  operation  on  the  availability  of  sufficient  unused  slots  on  the  top  of  the 
frame: 

A^l’^2  \-  af  =  ns 32  >32  •  •  •  >32  ns 32>32  nS64  >64  '  '  '  nS64>64  O''  :  ST 

' - V - '' - V - ' 

m  k 

A  translation  only  exists  if  the  allocator  provides  sufficient  space  in  the  frame. 

This  rule  also  illustrates  the  use  of  the  derived  movj  partial  instruction  sequence  from  section 
7.2.1.  The  technique  used  to  handle  stack  slot  destinations  above  implies  that  the  translation  must 
be  prepared  for  the  destination  register  to  be  an  alias  for  the  temporary  register.  This  is  handled 
here  by  the  movj  instruction,  which  performs  a  move  on  its  argument  registers  and  junks  the 
source  register  only  if  the  registers  are  different.  Consequently,  if  r  =  r^,  then  the  result  is  left  in 
r^:  and  if  r  7^  r^,  the  result  is  moved  to  r  and  the  register  is  junked. 
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The  rest  of  the  operation  translation  rules  proceed  in  a  similar  fashion.  The  complete  translation 
rules  can  be  found  in  section  8.4  below. 

8.3.3  Soundness  of  the  operation  translations 

The  soundness  theorem  for  the  operation  translation  states  that  for  a  well-typed  operation  and  a 
good  allocator;  whenever  the  operation  is  related  to  a  destination  and  a  partial  code  sequence  by 
the  translation,  the  partial  code  sequence  is  well-formed  and  leaves  its  result  in  the  destination. 
Note  that  the  translation  assumes  that  the  register  which  is  left  unmapped  by  the  allocator  will 
contain  an  exception  frame. 

Theorem  20  (Soundness  of  the  operation  translation.) 

If  d';  A;  T;  ^  h  opr  :  r  oprgj  and  A  is  a  good  allocator  for  T  and  |-  .  gj'  ^pi ,p2  |-  .  gj' 

then: 

1.  If  'h;  A;  T;  fji,  CJ2  b  r  ^  opr  :  T  ^  5 
then 

where  |r|^^°‘^^{re:  Exnptr((Ti)}  =  T^ 

2.  If  'h;  A;  T;  fji,  CJ2  b  sp(i)  <— opr  :  r  ^  S' 
then 

1^1;  A/’i-PS;  Ta  b  5  ^  rA{sp:(T'} 

where  |r||^^°‘^^{re:  Exnptr((Ti)}  =  Ta  and  rA(sp)  =  a  and  {cr)[i]32  |t|  =  a'. 

Proof:  By  induction  on  derivations.  The  proof  proceeds  by  cases  on  the  last  rule  used.  Note 
that  only  one  rule  applies  when  the  destination  is  a  stack  slot,  and  that  when  the  destination  is 
a  register,  at  most  one  rule  applies  for  each  instruction  form.  Also  note  that  the  good  allocator 
assumption  guarantees  that  |r|^^°‘^^(rt)  =  ns32,  and  hence  for  cases  that  modify  r^,  the  output 
typing  condition  requires  that  r^  be  coerced  to  ns 32- 

Throughout  the  proof,  I  use  Ta  to  refer  to  |r|^^°‘^^{re:  Exnptr((Ti)} 

1.  Suppose  'k;  A;  T;  A,  (T1CJ2  b  sp(i)  <—  opr:T  S.  The  stack  slot  rule  is  the  only  rule  that 
applies,  so 
By  inversion: 

'k;  A;  T;  A,  (Ti ,  CJ2  b  r^  <—  opr  :t'^S 
Let  a  =  rA(sp). 

By  induction: 

By  the  stack  write  rule  and  lemma  34: 

|4'|;  A^i’/’2;rA{rt:|T|}  b  swrite  sp(i),rt;  ^  rA{sp:((T)[i]32  ^  |r|}{rt:ns32} 

junk  rt 

So  by  lemma  28  (composition): 

1^1;  A/’i-/’2;rA  b  5;  ^  rA{sp:(cj)[i]32  ^  |r|}{rt:ns32} 

swrite  sp(i),rt; 
junk  rt 
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2.  Suppose  opr  =  sv. 

By  assumption: 

sv  :  T  oprgj 

A,r,^|-r^s?;:r'^  mov  r,  sv' 

By  inversion: 

A;  B;  ^  h  sv  :t 
A,  r,  A  I-32  sv  :t  sv' 

To  show: 

|\j/|;  /\pi,P2.  |-  niovr,  sv'  ^  r2i{r:|r|} 

By  theorem  18: 

|\J/|;  /\P1,P2.  |-  gy'  .  l^-j 

By  the  mov  typing  rule: 

|\j/|;  /\pi,P2.  |-  niovr,  sv'  ^  r^{r:|r|} 

3.  Suppose  opr  =  select  sv 
By  assumption: 

T;  A;  T;  ^  h  select  sv  :  Ti  oprgj 

A,  r,  ^  h  r  ^  select  sv  :Ti'^  mov  r,  sv' 

loadr  r,  r(i) 

By  inversion: 

T;  A;  r  h  s?; :  x  (tq,  . . . ,  tj, . . . ,  Tn) 

A,r,^h32  sv  :  X  {to,  .  .  .  ,Ti,  .  .  .  ,Tn)  Sv' 

To  show: 

|\j/|;  /\pi,P2.  movr,  sv'  ^  rA{r:|ri|} 
loadr  r,  r(z) 

By  theorem  18: 

|T|;  A^i’/’2;rA  h  s?;' :  I  X  (tq,  . . .  ,Ti, . . .  ,Tn)\ 

By  definition: 

I  X  (to,  . . .  ,ri, . . .  ,rn)|  =  x(|ro|,...,|ri|,...,|rn|) 

By  the  mov  and  load  typing  rules: 

|\j/|;  /\pi,P2.  p  movr,  sv'  ^  rA{r:|T|} 
loadr  r,  r(z) 

4.  Suppose  opr  =  dyntag^. 

By  the  new  tag  instruction  rule: 

|T|;  A^i’/’2;rA  h  dyntagpi  r  ^  rA{r:|  Dyntag(c)|} 

5.  Suppose  opr  =  hoxfv. 

By  inversion: 

4';  A;r;^,  fTi,cJ2  I-64 /t :  Float /p' 

4/;  A;  Th 
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To  show: 

1^1;  h  malloc|^l  rjv'  =>  rA{r:|  Boxed((/>)|} 

By  assumption  ^  is  a  good  allocator  for  T,  so 
By  theorem  19: 

By  the  malloc^  rule: 

1^1;  A/’i-/’2;rA  h  malloc|^l  rjv'  T^ir:  Boxed(|(/)|)} 

(Note,  I  Boxed((/))|  =  Boxed(|i;i)|)) 

6.  Suppose  opr  =  {svi, . . . ,  svn)- 
By  inversion: 

A;r;^,  fTi,cJ2  I-32  svi  -.n  sv'^ 

A;r  h  sVi'.Ti 

To  show: 

1^1;  A/’i-/’2.r^  I-  mallocr,  [|ri|, . . . , \Tn\]{sv[, . .  .,sv'J  rA{r:|rx|} 
where  Tx  =  x[ri, .  ..,Tn] 

By  theorem  18: 

1^1;  APi-P2.r^  I-  svi :  \Ti\ 

So  the  result  follows  directly  by  the  malloc  rule. 

7.  Suppose  opr  =  call  sv{svo, . . . ,  sVm-i){fvo.,  ■  ■  ■ 

By  inversion: 

|r|^(sp)  =  {ri:ns32,r2:ns32,re:ns32,r4:ns32,fi:ns64,f2:ns64,sp  :  a/} 

/\Pl,P2  \-  (J.  =  ns ^2  >32  •  •  •  >32  ns 32  >32  ^504  >64  '  '  '  ^564  >640-'  :  ST 

m  k 

'h;  A;r;^,  fTi,cJ2  I-32  sv.  Code(ro  )  •  • • )  Tm—l  )(</>0,-  •  -Ak-l)  -^T'^  sv' 

'h;  A;r;^,  (Ti,cr2  \-i2  sVi'.Ti sv'^  i  G  1 

A;r;^,  fTi,cJ2  I-32  /Cj :  (/>i  /c'  i  G  0  . . .  /c  -  1 

A;r  h  s?; :  Code[ro, . . .  ,rn][(/>o,  •  •  •  ,</>fc](r) 

4';  A;r  h  sVi'.Ti 

To  show: 

|\j/|.  A/’i’P2.  f  swrite  sp(m  +  2  *  {k  —  ^  r2i{r:|r|} 

fswrite  sp(m),/?;Q 
swrite  sp(m  —  1), 

swrite  sp(0),s?;g 
call  sv'[a'  o  (Ti,  CJ2] 

movj  r, 

The  proof  proceeds  by  stepping  through  the  emitted  instructions  and  applying  the  appropriate 
typing  rules.  For  brevity,  I  will  simply  describe  the  register  file  type  after  each  instruction 
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rather  than  restating  the  entire  typing  judgement.  I  also  leave  the  inner  inductions  on  m  and 
k  informal. 

By  theorems  19  and  18: 

\^-AP^’P^;Ta^  sv'f.\Ti\ 

By  the  good  allocator  assumption,  r^(sp)  =  a f  o  ai  o  a2- 
By  assumption: 

/\pi,p2  h  CT/  =  nsz2  >32  •  •  •  t>32  ^532  >32  n564  >64  ' ' '  nse4>GiO'' :  ST  where  aj  =  |r|^(sp) 

m  k 

Therefore,  by  the  swrite  and  fswrite  rules,  it  suffices  to  show  that  the  remainder  of  the 
instruction  sequence  after  the  stack  writes  is  well-typed  assuming  that  the  register  file  has 
the  type: 

rA{sp:fTc}  where  Uc  =  tq  >32  •  •  •  >32  Tm-i  >32  4>0  >64  •  •  •  (pk-i  >64  cr'  oaio  a2 
By  theorem  18: 

|^|;^Pi,P2.r^  I-  s?; :  I  Code(T0, . . .  ,rm-i)(</>o,  •  •  •  ^  t| 

And  by  definition  14: 

I  Code(To,  .  .  .  ,  ^Vra— l)(0O)  •  •  •  )  0fc— 1)  ^  "^1  ~ 

y[pr.ST,p2:ST]. 

{ri:ns32,r2-ns32,  fi:ns64,  f2:ns64,  fe:  Exnptr(/92),  rt:fiS32, 
sp:  cont(|cr32  O  o-64|)(pi)(/>2)(|'r|)  >32  (0-32  o  (0-64  o  Piop2))}^0 
where  (T32  =  tq  >32  •  •  •  >32  r^-i  >32  e 
and  (764  =  (f>0  >64  •  •  •  (pk-l  >64  C 

So  by  the  instantiation  rule,  sv[a'  o  cji,cj2]  has  type: 

Tc  ^  0  where 

Tc  =  {ri:ns32,  r2:ns32,  fi:ns64,  h'-ns64,  ^e-  Exnptr(fT2),  rt:ns32, 

Sp:COnt(|(T32  0  fT64|)(cr'oo-i)((T2)(|T|)>32  (<T32  O  (cJ64  O  Cj' O  fTi  O  fT2))} 

Let  Tret  =  cont(|fT32  o  cr64|)(cr'  O  (Ti)(fT2)(|r|) 

Note  that  Tc  =  r^{sp:rret  >32  o'c}  and  so  the  call  rule  applies.  Therefore,  it  suffices  to  show 
that  the  remainder  of  the  code  sequence  is  well- typed  under  Tret,  where  Tret  =  Bret  ^  0. 

By  lemma  38,  the  register  file  after  applying  the  last  move  and  junk  instruction  is 

rret{rt:fiS32}{r:|r|} 


But  notice  that  by  definition  of  cont  (definition  13): 

Bret  =  {ri:ns32^r2^32,fi:ns64,f2:ns64,re:Exnptr(fT2),rt:|r|, 
sp:ns32™'^^*^  o  (t'  o  cTi  o  CJ2} 

So  Bret  =  BA{rt:|T|}. 

Finally,  recall  that  by  the  good  allocator  assumption: 
BA{rt:ns32}  =  B^ 
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Hence: 

rret{rt:fiS32}{r:|'r|} 

=  rA{rt:|7'|}{rt:ns32}{r:|r|} 

=  rA{rt:ns32}{r:|r|} 

=  rA{r:|r|} 

8.  Suppose  opr  =  array^(s?;i,  s?;2) 

By  inversion: 

A;  ai,a2  I-32  svi :  Int  sv[ 

A-,T-,A,ai,a2  I-32  sv2'-t^  sv'2 
A;  r  h  sill :  Int 
A;  r  h  SV2  '■  t 

To  show: 

1^1;  A/’i-/’2;rA  h  malloc|^|  y,sv'^,sv'2  =>  rA{r:|Array32(r)|} 

By  theorem  18: 

I'I'I;  \-  gy'^ ;  |lnt| 

I'I'I;  A^i’^2j  \-  gy'^  ■  |7-| 

so  by  the  malloc  rule: 

|T|;  A^i’/’2;rA  b  mallocpi  r,sv[,sv2  rA{r:Array32(|r|)} 

9.  Suppose  opr  =  farray^(sr!,/?;) 

By  inversion: 

T;  A;  T;  fji ,  (72  1-32  sv:  Int  sv' 

T;  A;r;^,  fTi,cJ2  fv  :  (p  fv' 

T;  A;  r  h  SI! :  Int 
T;A;rh/?;:</. 

To  show: 

|T|;  A^i’/’2;rA  b  fmalloc|<^|  r,sv'Jv  rA{r:|Arrayg4(r)|} 

By  theorems  18  and  19: 

I'I'I;  A^'’^2j  \-  gy/  ■  |lnt| 

|^|;A''i-P2;rA  b/?;':  |r| 

so  by  the  fmalloc  rule: 

1^1;  A''i-/’2.r^  b  fmalloc|<^|  r,sv'Jv'  rA{r:|Arrayg4(r)|} 

10.  Suppose  opr  =  subT-(s?;i,  s?;2). 

By  inversion: 

'i/;  A;T;A,ai,a2  b32  svi  :Array32(r)  sv[ 

'h;  A;  T;^,  ui,  cr2  b32  SV2  :  Int  SV2 
'h;  A;  T  b  s?;i :  Array32(r) 

'h;  A;  T  b  SV2  ■  Int 

To  show: 

I'I'I;  A^'’^2j  \-  subpi  r,  sv'i,  sv'2  ^  rA{r:|T|} 
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By  theorem  18: 

|\j/|j  /\pi,P2.  h  sv'^ :  |Array32(r)| 

|\|/|;  /\Pl,P2.  |-  gy'^  . 

so  by  the  sub  rule: 

1^1;  A/’i-/’2;rA  b  sub|^|  ^  rA{r:|r|} 

11.  Suppose  opr  =  upd.^(s?;i,  sv2-,  sv^). 

By  inversion: 

A;r;^,  fTi,cJ2  I-32  s?;i :  Array32(r)  s?;'i 
'h;  A;  B;^,  ui,  cr2  I-32  SV2  :  Int  SV2 
A;r;^,  fTi,(T2  I-32  svs-.t--^  sv'^, 

'h;  A;  r  h  sill :  Array32(r) 

'h;  A;  r  h  SV2  ■  Int 
'h;  A;  r  h  svs  :  r 

To  show: 

|\j/|.  A/’i’P2.  p  upds?;'^,  SV2,  Sr’s  ^  r24{r:Unit} 
malloc  r,  []() 

By  theorem  18: 

1^1;  A/’i-/’2.r^  h  sv[  :  |Array32(r)| 

|\J/|;  A/’1’P2.  |-  gy>^  ■ 

|^|;A/’i-P2.r^hs?;'3:|r| 

So  by  the  upd  rule: 

|\j/|j  /^pi,p2.^  \-  upds?;'^,  SV2,  sv'^ 

And  by  the  malloc  rule: 

|\j/|j  /^pi,P2.^  \-  malloc  r,  []()  ^  r2i{r:Unit} 

12.  Suppose  opr  =  fupd^(s?;i,  s?;2,/p). 

By  inversion: 

T;  A;r;^,  fTi,cJ2  I-32  svi  :Array32(r)  s?;^ 

T;  A;  T;^,  ui,  cr2  I-32  SV2  :  Int  SV2 
T;  A;r;^,  fTi,cJ2  ^64  fv:  (p  ^  fv' 

T;  A;  r  h  s?;i :  Array32(r) 

T;  A;  r  h  SV2  ■  Int 
d/;  A;  Th 

To  show: 

I'I'I;  A^i’^2j  1-  fupds?;'^,  sv2,fv'  ^  r2i{r:Unit} 

malloc  r,  []() 

By  theorems  18  and  19: 

1^1;  A/’i-/’2;r^  h  s?;'^ :  |Array32(r)| 

|\J/|.  ^Pl,P2.^  P  gy>^  ■ 

|T|;A^i-/’2.r^py.|<^| 
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So  by  the  fupd  rule: 

|\j/|j  /\pi,P2.  h  fupds?;'^,  sv'2ifv' 

And  by  the  malloc  rule: 

|\|/|;  /\pi,P2.  |-  nialloc  r,  []()  ^  ry4{r:Unit} 


A  useful  corollary  states  that  state  of  the  machine  after  executing  the  translation  of  an  operation 
is  the  same  as  the  machine  state  given  by  translating  the  extended  context. 

Corollary  3  (Operation  context  extension.) 

If  'h;  A;  h  opr  :r  oprgj 

and  A  is  a  good  allocator  for  T  and  for  F,  x:t 
and  h  cji :  5r 

and  AP^^P^  h  (72  :  5r 

and  'F;  A;  F;  .4.,  (71,  (72  F  .4,(x)  <— opr  :  r 5 

then 

\'I>\;AP^’P^;rAl-  S  ^Tb 

where  T A  =  |F|^^°'^^{re:  Exnptr((7i)}  and  Tb  =  |F,  x:r|^^°'^^{re:  Exnptr((7i)} 

Proof:  The  proof  follows  trivially  by  applying  theorem  20  and  observing  that  the  good  allocator 
assumption  implies  that  the  resulting  context  is  equal  to  the  translation  of  the  extended  context. 

■ 

The  proof  of  soundness  for  the  64  bit  operation  translation  follows  exactly  the  same  form  as 
that  of  the  32  bit  operation  translation. 

Theorem  21  (Soundness  of  the  64  bit  operation  translation.) 

If  T;  A;  F;  ^  h  /opr  :  4>  oprg^  and  A  is  a  good  allocator  for  F  and  AP^’P'^  h  a\ :  ST  and  AP^'P^  h 
(72  :  ST  then: 

1.  If  T;  A;  F;  (7i ,  (72  F  f  <—  /opr  :  5 

then 

\'I>\-,AP^’P^-,TAhS^TA{f:m 

where  |F|^^°‘^^{re:  Exnptr((7i)}  =  F^ 

2.  If  T;  A;  F;  .4,  (7i,  (72  F  sp(i)  <— /opr  :  ^  S' 
then 

1^1;  A/’i-PS;  Fa  F  5  ^  Fa{sp:(7'} 

where  |F|^^°'^^{re:  Exnptr((7i)}  =  Fa  and  Fa(sp)  =  a  and  <—  |</|  =  cr' . 

Proof:  By  induction  on  derivations.  The  proof  proceeds  by  cases  on  the  last  rule  used.  Note 
that  only  one  rule  applies  when  the  destination  is  a  stack  slot,  and  that  when  the  destination  is 
a  register,  at  most  one  rule  applies  for  each  instruction  form.  Also  note  that  the  good  allocator 
assumption  guarantees  that  |F|^^°'^^(ft)  =  ns32,  and  hence  for  cases  that  modify  ft,  the  output 
typing  condition  requires  that  ft  be  coerced  to  ns64- 

Throughout  the  proof,  I  use  Fa  to  refer  to  |F|^^°'^^{re:  Exnptr((7i)} 
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1.  Suppose  A;  F;  cji(T2  F  sp(i)  <—  fopr-.cj)  ^  S.  The  stack  slot  rule  is  the  only  rule  that 
applies,  so 
By  inversion: 

A;r;^,  fTi,(T2  h  ft  ^  fopr-.cj)'^  S 
Let  a  =  r/i(sp). 

By  induction: 

By  the  stack  write  rule  and  lemma  34: 

1^1;  A/’i-/’2.r^{f^.|0|}  I-  fswrite  sp(i),ft;  ^  rA{sp:(cr)[i]64 

junk  ft 

So  by  lemma  28  (composition): 

|4/|;A^i’/’2.r^{f^.|<^|}K5;  ^rA{sp:(a)[i]64 

fswrite  sp(i),ft; 
junk  ft 

2.  Suppose  fopr  =  fv. 

By  assumption: 

4';A;r;^h/?;:(()oprg4 
'k;A;r;^,  (Ti,cJ2l“f  ^  fv  :t  ^  f  mov  f,fv' 

By  inversion: 

^;A-,T;Ah  fv.cl) 

To  show: 

1^1;  A/’i-/’2;rA  L  fraovijv'  rA{f:|(/>|} 

By  theorem  19: 

|^|;A^i-/’2.r^Ky.|<^| 

By  the  fmov  typing  rule: 

I'Ll;  A^i’^2j  \-  fmovf,  sv'  ^  rA{f:|i;A|} 

3.  Suppose  fopr  =  f  sub,ji(s?;i,  5^2)- 
By  inversion: 

4';  A;r;^,  fTi,(T2  I-32  sri :  Arrayg4((())  ^  sv[ 

'k;  A;  ui,  (T2  I-32  SV2  :  Int  SV2 
4';  A;  r  h  sn  :  Array g4((/)) 
dt;  A;  r  h  SV2  ■  Int 

To  show: 

|4'|;  A^i’/’2;rA  L  subj^i  f,s?;4,sr2  ^  rA{f:|<()|} 

By  theorem  18: 

1^1;  A/’i-/’2;r^  h  sv[  :  |Arrayg4((())| 

I'Ll;  A^i’^2j  p  gy>^  ■  Int; 

so  by  the  fsub  rule: 

|4'|;  A^i’/’2;P^  h  sub|,ji|  f,s?;t,sr2  ^  rA{f:|<()|} 


|(()|}{ft:ns64} 

|(()|}{ft:ns64} 
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4.  Suppose  fopr  =  unbox  sv 
By  inversion: 

4';  A;  B;  fji,  (T2  Bga  sv  :  Boxed((p)  sv' 

'B;  A;  r  h  St! :  Boxed  (j) 

To  show: 

|\j/|j  /\pi,P2-  |-  fioadr  f,  sv'  ^  r^{f:|i;A|} 

By  theorem  18: 

I'l'l;  \-  gyf  ■  I  Boxed((/))| 

so  by  the  fioadr  rule: 

|\j/|j  /\pi,P2-  |-  fioadr  f,  sv'  ry4{f:|i;A|} 


As  before,  a  useful  corollary  states  that  state  of  the  machine  after  executing  the  translation  of 
a  64  bit  operation  is  the  same  as  the  machine  state  given  by  translating  the  extended  context. 

Corollary  4  (Operation  context  extension.) 

If  4^;  A;  T;  Ah /opr  :(/)  opr32 

and  A  is  a  good  allocator  for  T  and  for  T,Xf:T 
and  h  cji :  5r 

and  AP^'P^  h  £72  :  5r 

and  'h;  A;  T;  A,  £7i,  £72  h  A(xf)  <— /opr  :  i;A 5 

then 

\g/\;AP^’P^;TA^  S  ^Tb 

where  Fa  =  |r|^^°'^^{re:  Exnptr(£7i)}  and  Tb  =  |r,  x/:(/>|^^°‘^^{re:  Exnptr(£7i)} 

Proof:  The  proof  follows  trivially  by  applying  theorem  21  and  observing  that  the  good  allocator 
assumption  implies  that  the  resulting  context  is  equal  to  the  translation  of  the  extended  context. 


8.3.4  Expressions 

The  general  expression  translation  judgement  takes  the  form  dt;  A;  T;  A,  £7i ,  £72  \~c  e  :  r  ^  /;F.  LIL 
expressions  translate  to  TILTAL  instruction  sequences  /,  terminated  either  by  a  return,  a  jump, 
or  a  halt,  depending  on  the  context  of  occurrence  of  the  expression.  In  addition,  the  translation 
of  nested  sub-expressions  may  produce  a  heap  fragment  F  containing  additional  code  blocks  which 
must  be  allocated  in  the  heap  at  the  top  level.  All  new  labels  generated  by  the  translation  are 
assumed  to  be  fresh. 

Occurrence  parameters 

As  usual,  the  translation  relies  on  an  allocator  and  two  stack  types  describing  the  stack  layout  below 
the  current  stack  frame.  Additionally,  the  translation  is  parameterized  by  a  context  of  occurrence 
C  indicating  the  context  in  which  the  translated  term  occurs.  For  the  purposes  of  this  translation, 
contexts  range  over  return  contexts,  jump  contexts,  and  halt  contexts. 

C::  =  ret  |  jmps?;  |  halt 
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Return  contexts  indicate  that  the  expression  is  a  function  body  and  should  exit  using  the  function 
return  convention.  Jump  contexts  indicate  the  expression  occurs  as  a  sub-block  of  a  larger  expres¬ 
sion,  and  should  exit  by  placing  the  return  value  in  and  jumping  to  the  provided  address.  Halt 
contexts  indicate  that  the  expression  occurs  as  a  program  body  and  should  exit  by  terminating  the 
program. 

Additional  contexts  are  possible:  for  example  the  implementation  uses  a  special  context  to 
efficiently  translate  boolean  expressions  whose  only  use  is  to  decide  branches.  The  implementation 
also  uses  this  mechanism  to  implement  proper  tail-recursion:  function  calls  occurring  at  the  end  of 
expressions  translated  in  the  return  context  are  tail  calls.  The  formal  translation  does  not  address 
this  optimization  since  it  adds  nothing  interesting  to  the  development.  As  with  the  use  of  the 
destination  parameter  in  the  operation  translation,  this  idea  is  closely  related  to  the  techniques 
used  by  Dybvig,  et  al  [DHB90]. 

Most  of  the  translation  inference  rules  are  parametric  with  respect  to  the  context  of  occurrence. 
The  exceptions  are  the  rules  for  the  small  value  base  case  of  expressions,  where  the  the  appropriate 
terminator  for  the  instruction  sequence  is  chosen. 

Return  values 

For  return  contexts,  the  translation  inference  rule  requires  that  the  top  element  of  the  stack  (below 
the  frame)  be  an  appropriate  continuation  type^.  The  code  sequence  emitted  moves  the  return 
value  to  rt,  de-allocates  the  stack  frame,  coerces  the  registers  and  in-arguments  to  nonsense,  and 
returns. 

/\P1,P2  \-  (Ji  =  (cont(m)((j()((T2)(|Tr|))  >32  :  ST 

'k;  A;  T;  A,  ui,  1T2  ^32  sv  :Tr  ^  sv'  frmsz(^)  =  n 

'k;  A;  T;  fji,  CJ2  Fret  sv.Tr"^  mov  r^,  sv'] 

sfree  ri] 
junkregs; 
junkstack  1 . . .  m; 
ret; 

For  expressions  translated  in  a  jump  context,  the  context  of  occurrence  carries  with  it  the 
destination  of  the  jump.  The  translation  rule  makes  no  explicit  assumptions  about  the  state  of 
the  stack,  but  insists  that  the  type  of  the  destination  be  appropriate  for  a  jump.  The  translated 
code  moves  the  result  value  to  the  temporary  register  and  jumps  to  the  destination.  Note  that  the 
destination  is  assumed  to  have  been  previously  instantiated  with  all  of  the  necessary  type  variables: 
it  is  not  sound  to  simply  instantiate  it  with  the  current  typing  context  A,  since  A  may  contain 
more  type  variables  than  the  destination  is  expecting. 

1^1;  A^i’^2.  g  .  r{rt:|r|}  ^  0 

'k;  A;F;^,  cri,cr2  F32  sv.t^  sv' 

A;F;^,fTi,fT2  s?; :  r mov  rt,s?;'; 

jmp  svi] 

^It  is  an  invariant  of  the  translation  as  stated  that  this  should  always  be  true.  Note  that  the  soundness  of  the 
translation  does  not  rely  on  this  invariant,  since  the  existence  of  a  translation  derivation  is  predicated  on  it  being 
satisfied.  However,  a  proof  of  completeness  of  the  term  translation  would  require  that  this  assumption  be  shown 
valid. 
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Finally,  expressions  translated  in  a  halt  context  move  the  result  value  to  the  temporary  register, 
de-allocate  the  stack  frame,  and  halt  the  program.  Note  that  the  translation  insists  that  the  stack 
segments  below  the  current  frame  be  empty. 

'k;  A;  F;  e,  e  I-32  sv.Tr"^  sv'  frmsz(^)  =  n 


'k;  A;  F;  e,  e  Fhait  sv.Tr  mov  rt,  sv'] 

sfree  n; 


halt 


Tr  5 


Spilling 

For  the  most  part,  all  of  the  exception  rules  are  syntax  directed.  However,  I  include  one  non-syntax 
directed  rule  in  the  translation  to  give  the  allocator  more  flexibility  in  its  choices  of  locations.  The 
non-deterministic  spill  rule  allows  a  shift  to  a  new  good  allocator  so  long  as  a  code  sequence  can 
be  found  that  re-arranges  the  frame  appropriately. 

'k;  A;F;^',(Ti,(T2  he  e  :  r  ^  / 

1^1;  h  5  ^ 

where  is  a  good  allocator  for  e 

4';  A;F;hl,  cJi,cJ2  he  e:r^  5;/ 

This  rule  is  intended  to  capture  the  fact  that  a  register  allocator  may  wish  to  insert  code  into  the 
instruction  stream  at  various  points  (such  as  spill  and  restore  code) .  Since  various  instructions  (such 
as  function  calls)  insist  that  the  register  file  be  empty,  this  rule  gives  the  allocator  an  opportunity 
to  obey  this  constraint  without  forcing  all  variables  live  across  the  call  to  be  permanently  stack 
resident. 

Operations 

The  translation  of  ordinary  LIL  operation  binding  demonstrates  the  interaction  between  the  allo¬ 
cator  and  the  destination  based  translation  used  for  operations. 

'k;  A;  F;  hi,  cJi ,  (T2  h  A{x)  ^  opr  -.Ti'^S 
A;  F,  x:Ti]  A,  ui,  U2  he  e  :  r  / 

'k;  A;  F;  hi,  fJi,  (T2  he  let,-  x  =  opr  ine  :  r  5; 

I 

The  operation  opr  is  translated  with  respect  to  the  destination  chosen  by  the  allocator  for  x  to 
produce  a  partial  instruction  sequence  S  that  implements  the  operation  and  leaves  the  result  in 
the  chosen  destination.  The  rest  of  the  expression  is  translated  to  produce  an  instruction  sequence 
/,  to  which  S  is  prepended.  Uses  of  x  within  e  will  be  translated  via  the  allocator  to  references  to 
the  chosen  location  for  x. 

Operations  translated  as  part  of  the  expression  translation 

Some  operations  are  not  given  direct  translations  as  part  of  the  operation  translation,  but  are 
instead  translated  directly  by  the  expression  translation.  This  is  done  to  keep  the  operation  trans- 
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A;r;^,  fji,  £72  I-32  sv  :  Dyn'^  sv'  'I';  A;  T;  cji,  CJ2  I-32  svi :  Dyiitag(ri)  ^  sv' 
4';A;r,xi:  x  [Dyiitag(ri),  n];  ^  cji,  0-2  ^jmp4nd[(niA),ai,a2]  ei:T^'-^h 

-^1  *^1)  ^2  ^jmp£end[(niA),(Tl,cr2]  ®2  •  'Tx  I2 

'I';  A;  r,  x:r;  A,ai,a2'^c  e\T  I 

|r|^loa2  ^ 

'I';  A;  F;  cji,  cJ2  He  let  x  =  dyncase(s?;)(s?;i  ^  xi.ei,  _  ^  62)  ine  :  r 

movr^,  sv' 

brdyn  r^,  s?;i,£i[(niA),cJi,o-2] 

junk  rt 

h 

£i:y[\A\,  pi-.ST,  P2:ST].T A[rt-  x  [Dyntag(|ri|),  |ri|]] 
srmov  A{xi), 
junk  rt 

h 

^end‘  V[|A|,pi:5r,p2:5T].rA[rt:|T3.|] 
srmov  ^(x),rt 
junk  rt 
/ 


Figure  8.3:  Exception  case  translation  in  TILTAL. 


lation  simpler,  usually  because  the  operations  generate  additional  heap  allocated  code  blocks.  Sum 
and  exception  case  analysis  and  handlers  all  fall  into  this  category. 

The  translation  of  exception  case  analysis  (figure  8.3)  produces  two  new  code  blocks  and  a  code 
sequence.  The  code  sequence  compares  the  tag  of  an  exception  in  register  rt  to  a  known  tag,  and  if 
equal  branches  to  the  first  additional  code  block:  £1.  This  code  block  moves  the  refined  exception 
into  the  destination  assigned  to  xi  by  the  allocator,  and  then  executes  the  translation  of  the  body 
of  the  case  statement  ei.  Notice  though  that  the  context  parameter  for  the  expression  translation 
is  changed  to  indicate  the  block  should  exit  by  jumping  to  the  merge  point  ^end-  Since  code  blocks 
must  be  closed,  all  of  the  free  type  variables  must  be  threaded  through  the  blocks:  hence  the 
instantiation  of  the  destination  label  £end[(ni'^))  o'!)  <72]-  (I  use  the  notation  IIiA  to  indicate  the 
constructor  variables  bound  in  the  domain  of  A.) 

In  the  case  that  the  branch  is  not  taken,  the  other  branch  of  the  case  is  executed.  Note  that 
it  too  is  translated  in  a  context  with  the  merge  point  as  its  jump  destination.  The  remainder  of 
the  expression,  e,  is  translated  in  the  original  context  of  occurrence,  and  then  is  emitted  with  the 
merge  point  label  ^end,  with  the  free  type  variables  suitably  abstracted.  Note  that  by  convention 
the  continuation  expects  its  argument  in  r^. 

The  translation  of  sum  cases  follows  essentially  the  same  form,  but  with  substantially  more 
cases.  The  principle  additional  complication  is  that  numerous  branches  must  be  generated  to 
distinguish  between  the  various  tags  and  tagged  record  within  the  union  type.  No  effort  is  made  in 
the  formal  translation  to  emit  these  in  an  especially  efficient  fashion  (for  example  by  using  general 
comparisons  and  intermediate  branches  instead  of  simple  equality  tests). 

Exception  handling  is  handled  in  a  relatively  inefficient  manner  for  simplicity  in  the  translation 
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(see  figure  8.4).  The  translation  insists  that  the  allocator  spill  all  registers  to  the  stack  at  every 
handler  site,  and  then  pushes  the  old  exception  handler  frame  onto  the  stack  and  copies  the  entire 
frame  over  it.  The  handled  expression  is  translated  with  a  jump  continuation  that  forwards  control 
to  a  postlude  which  cleans  up  the  stack  and  restores  the  old  handler  before  jumping  to  the  merge 
point.  The  handler  code  itself  does  a  similar  cleanup  before  executing  the  handler  code,  which  is 
translated  with  the  final  merge  point  as  its  continuation. 

Raising  exceptions  is  also  dealt  with  at  the  expression  level,  not  because  it  requires  additional 
code  blocks  to  be  emitted,  but  rather  because  exception  raising  code  ends  with  an  unconditional 
jump  and  hence  does  not  fit  into  the  syntactic  category  of  partial  instruction  sequences.  To  raise 
an  exception,  the  exception  packet  must  be  placed  in  ri,  the  handler  stack  must  be  restored  from 
the  exception  frame  before  jumping  to  the  handler,  also  obtained  from  the  frame. 

'k;  A;r;^,  cri,cr2  I-32  sv  :  Dyn'^  sv' 


'k;  A;  T;  fji,  (T2  be  let  x  =  raise,-  sr  ine  :  r  movri,  sv 

loadr  rt,re(l) 
movsp, 
loadr  rt,re(0) 
jmprt 


Type  operations 

The  remainder  of  the  expression  translation  inference  rules  deal  with  the  type  instructions,  such  as 
unpacking  existentials  and  refining  types  using  the  LX  based  primitives.  Since  these  constructs  are 
essentially  the  same  in  LIL  and  TILTAL,  the  translation  does  little  interesting  work.  For  example, 
the  refinement  operation  for  pairs  simply  translates  the  rest  of  the  expression  and  prepends  a  pair 
refinement  instruction. 


A  \-  c=  a:  Ki  X  K2 

4';  A,/3:Ki,7:/i2;  1 

r[(/3,7)/a];  >  bc[(/3,7>/a]  e[{P,j)/a]:T[{P,j)/a]  I 

'k;  A,  a:Ki  x  ^2;  L;  A,  ui,  cJ2  be  let{P,  7)  =  c  ine  :  r  ref  ine(/3, 7)  =  |c|; 

/ 

The  rest  of  the  type  instruction  rules  are  similarly  un-interesting  and  are  not  described  here 
further.  A  complete  listing  of  all  of  the  expression  translation  is  given  in  section  8.4. 

8.3.5  Soundness  of  the  expression  translation 

The  soundness  theorem  for  the  expression  translation  says  that  if  a  well-formed  LIL  expression 
translates  to  an  instruction  sequence  and  a  heap  fragment,  then  both  the  instruction  sequence  and 
the  heap  fragment  are  well-formed.  Note  that  the  heap  context  'k  is  extended  with  the  additional 
bindings  introduced  by  the  new  heap  when  typechecking  the  instruction  sequence.  As  before, 
I  assume  that  the  allocator  is  a  good  allocator  for  the  expression,  and  that  the  stack  segment 
parameters  are  well- formed. 


131 


^5  ^5  ^jmp£post[(ni  A),e,cr/i]  ^1  •  "^3;  ^1^  ^1 

^^5  ^5  ^5  ^2*  ^1  ;  ^2  ^  jmp  £end  [(I^l  A)  ,Crj'0(Tl  ,(72]  ^2  •  ^0^  -^2  5  -^2 

^';  A;r,x:ra;;^,  (Ti,o-2  he  e:r /;F 

Fa  =  {ri:ns32,  r2:ns32,  rt:ns32,  rg:  Exnptr(c72),  sp:o-/  o  cji  o  a2,  fi:ns64,  f2:ns64} 
where  Fa  =  |r|^^°'^^{re:  Exnptr((T2) 
and  Tb  =  |r|^^{re:  Exnptr((T3) 
and  af  =  |r  l!4(sp) 
and  CJ3  =  Exnptr((T2)  >32  a/  o  ai  o  <72 
and  ah  =  CTf  o  as 

d';  A;  F;  cji,  CJ2  he  let  x  =  handleT-^(ei,  X2.e2)  ine  :  r  ^ 

push  Tg 

stackcopy  (Exnptr(cj2))  >32  erj 
sfree  1 

malloc  rg [Exnhndler (cT/j),  cr/i](4andle [Hi A,  pi,  ^2],  sp) 

h 

4andie:V[A,  pi ,  P2] -{ri :  Dyn,  r2:ns32,  ri:ns32,  rg:ns32,  sp:e/i,  fi:ns64,  h'-nsQi} 

srmov  ^(x2),ri 
sfree  (frmsz  (^) ) 

pop  Fg 

junk  ri 

h 

^post  •  V[|A|,pi:5T,p2:5T].FB[rt:|r,|] 
sfree  (frmsz  (^) ) 

pop  Fg 

jmp  4nd[niA,ei,cr2] 

4nd :  V  [ I A I ,  PI :  5T,  P2 :  5T]  .F  A  [Ft :  I  r.  I  ] 

SFmOV  ^(x),Ft 
junk  Ft 

I; 

F; 

Fv, 

F2 


FiguFe  8.4:  Exception  handler  implementation  in  TILTAL. 
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Theorem  22  (Soundness  of  the  expression  translation.) 

If  A  is  a  good  allocator  for  e 

and  A;  r  h  e  :  r  exp 
and  hai:  ST 

and  AP^'P^  h  cj2  :  ST 
and  A;  F;  (Ti,  £72  l“c' e  :  r I;  F 

then 

l^'l  \-  F-.'Iif 

and  I'I'I, 'Fj?;  |- /  ok 

where  =  |r|^^°'^^{re:  Exnptr(£T2)} 

Proof:  By  induction  on  derivations.  The  proof  proceeds  by  cases  on  the  last  rule  of  the  translation 
derivation.  I  generally  identify  the  rule  under  consideration  by  listing  the  conclusion  of  the  rule, 
except  where  ambiguous. 

Throughout  the  proof,  let  =  |F|^^°‘^^{re:  Exnptr(£72)}. 

1.  Suppose 'F;  A;  F;  .4.,  cJi,  £72  Fret  SI” :  TV mov  r^^,sv'-, 

sfree  n; 
junkregs; 
junkstack  1 . . .  m; 
ret 

To  show: 

|'F|;  A^i’^2;  F^  h  mov  r^,sv';  ok 

sfree  n; 
junkregs; 
junkstack  1 . . .  m; 
ret 

By  assumption: 

'F;  A;  F;  ^  h  sv.Tr 

'F;  A;  F;^,  £71,(72  F32  sv.Tr sv' 

AP1,P2  h  0-^  =  (cont(m)(£7j)(£72)(|rr|))  >32  n  >32  •  •  •  >32  Tm  >32  Crj  :  ST 

So  by  theorem  18: 

|^|;A/’i-P2;FaFs?;':|t^| 

By  the  mov  typing  rule: 

1^1;  A/’i-/'2;Fa  F  mov  rt,s?;'  ^  FA{rt:|rr|} 

By  the  good  allocator  assumption,  Fa(sp)  =  £7/  o  (£7i  o  £72)  and  frmsz(()^)  =  |£7j| 

So  by  the  sfree  typing  rule: 

|4'|;  A^i’/’2;FA{rt:|rr-|}  F  sfree  n  ^  FA{rt:|rr|}{sp:(£7i  o  £72)} 

By  lemma  34: 

|4'|;  A^i’/’2;FA{rt:|T,.|}{sp:(£7i  o£72)}  F  junkregs  ^ 

rA{rt:|F-|}{sp:(£7i  o  £72)}{ri:ns32}{r2:ns32}{fi:ns64}{f2:ns64} 

Let  F(4  =  FA{rt:|rr|}{sp:(£7i  o  £72)}{ri:ns32}{r2:ns32}{fi:ns64}{f2:ns64} 

By  assumption: 

AP1,P2  p  0-^  =  (cont(m)(£7j)(£72)(|rr|))  >32  n  >32  . . .  >32  Tm  >32  cr'i  :  ST 
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So  by  lemma  35: 

I'I'I;  |- junkstack  l...m^ 

r'^{sp:(ns32”')  o  o  0-2)} 

And  by  definition: 

COnt(m)((T'i)((T2)(|7V|)  =  Tret  ^  0 

where 

Tret  =  {ri:ns32,r2:ns32,fi:ns64,f2:ns64,re:Exnptr(fT2),rt:|rr|,sp:ns32™  o  o'^  o  £12} 

So  by  the  return  rule,  it  suffices  to  show  that  : 

rr{sp:nS32™'  O  fj'i  O  CJ2}  =  Tret 

By  the  good  allocator  assumption: 
rr(re)  =  rA(re)  =  Exnptr(fT2) 

So  by  definition: 

=  {ri:ns32,r2:ns32,fi:ns64,f2:ns64,re:Exnptr(cj2),rt:|T,.|,sp:ns32”^  o  o  £12}  =  Tret 

2.  Suppose 'h;  A;  F;  £Ji,  £72  hjnips^j  St! :  r mov  r^,sv' 

jmp  svi 

To  show: 

|\j/|;  A/’i’/’2;  h  mov  rt,s?;^ok 

jmp  svi 

By  assumption: 

'h;  A;  T;  ^  h  sv  :t 

'k;  A;  T;  ^,£71,(72  I-32  sv.t^  sv' 

1^1;  h  s?;z:r{rt:|r|}  ^0 

So  by  theorem  18: 

|'k|;  b  sv' :  |r| 

By  the  mov  typing  rule: 

1^1;  A/’i-/’2.r^  I-  mov  ^  r2i{rt:|rr|} 

And  by  the  jmp  typing  rule: 

1^1;  A/’i-/’2;rA{rt:|T,.|}  b  jmp  sv'i  ok 

3.  Suppose 'k;  A;  F;  e,  e  bhait  SI” :  Tr  ^  mov  v^^sv' 

sfree  n 

halt,-^ 

By  inversion: 

'k;  A;  F;  e,  e  bgj  sv  :  Tr  sv' 

frmsz(^)  =  n 

'k;  A;  F;  ^  b  sv  :Tr 

By  theorem  18: 

|\k|j  /\Pl.P2.  p  gyf  . 

So  by  the  mov  typing  rule: 

1^1;  A/’i-/’2;F^  b  mov  rt,sv'  F2i{rt:|rr|} 
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By  the  good  allocator  assumption: 

r^Csp)  =  |r|^^  =  o  e 
where  Icr/I  =  frmsz(()^)  =  n 

So  by  the  sfree  typing  rule: 

|^|;/\pi,P2.r^{r^:|^^|  I-  sfree  n  ^  rA{rt:|r,.|}{sp:e} 

And  by  the  halt  typing  rule: 
l^'l;  A^i’/’2;rA{rt:|rr-|}{sp:e}  h  haltj^^l  ok 

4.  Suppose  \I';  A;  F;  cji ,  CJ2  \~c  let^-  x  =  opr  in  e  :  r  ^ 

By  inversion: 

'h;  A;  F;  fji ,  CJ2  b  ^(x)  ^  opr  -.Ti'^S 

'F;  A;  F,  x'-rp,  A,  ai,a2  be  e  :  r  I'F;  A;  F  b  opr  :  n  oprgj 

'F;  A;  F,  x:Ti  b  e  :  r  exp 

And  by  the  good  allocator  assumption,  ^  is  a  good  allocator  for  F. 

So  by  corollary  3 
|^|;A/’i-P2;FAb5^FB 

Where  F^  =  |F,  x:rj|^^°‘^^{re:  Exnptr(iTi)} 

By  the  definition  of  a  good  allocator  for  an  expression,  ^  is  a  good  allocator  for  the  sub¬ 
expression  e,  so: 

By  induction: 

|4'|  h  F-.'i/F 

|4'|,4'ir;  b  /  ok 

And  by  lemma  29  (partial  instruction  sequence  completion)  and  heap  context  weakening: 
|^|,^ir;A/’i’/’2;FA  b  (5;  I)  ok 

5.  Suppose  'F;  A;  F;  cji,  CJ2  be  let^-  Xf  =  fopr  ine  :  r  (5;  I);  F. 

By  inversion: 

'F;  A;  F;  fji,  (T2  b  A{xf)  ^  fopr 

'F;  A;  F,  A,  (Ti,  cJ2  be  e  :  r  /'F;  A;  F  b  fopr  :  <p  oprgj 
'F;  A;  F,  x/:r  b  e  :  r  exp 

And  by  the  good  allocator  assumption,  ^  is  a  good  allocator  for  F. 

So  by  corollary  4 
|^|;A/’i-P2;FAb5^FB 

Where  F^  =  |F,  x/:(()|)^^°'^^{re:  Exnptr(cJi)} 

By  the  definition  of  a  good  allocator  for  an  expression,  ^  is  a  good  allocator  for  the  sub¬ 
expression  e,  so: 

By  induction: 

|4'|  \-  F:^f 

|4'|,4'ir;  A^i’/’2.r^  b  /  ok 

And  by  lemma  29  (partial  instruction  sequence  completion)  and  heap  context  weakening: 

|^|,^ir;A/’i’/’2.r^  p  (5- j)  ok 
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6.  Suppose  A;  F;  ^  he  let  [a,  x]  =  unpack  s?;  ine  :  r  unpack  [a, 

srmov  ^(a:),rt; 
junk  Ft; 

F 

By  inversion: 

'i/;  A-,T;A,ai,a2  hga  s?;:3[k][c]  sv' 

A,  a:K]  r,  x:{ca)]A,  ui,  1T2  l“c  e:  t  ^  I]  F 
'k;  A;  r  h  St! :  3[k](c) 

'k;  A,  a:K]  F,  x:{ca)  h  e  :  r'  exp  a  ^  fv{T') 

By  theorem  18: 

|^|;A/’i’/’2;FAhs?;':|3[K](c)| 

Since  |3[«:](c)|  =  3[k](|c|),  by  the  unpack  rule  it  suffices  to  show: 

I'I'I;  A,  q,|  p  srmov  ^(x),  Ft;  ok 

junk  Ft; 

I 

By  lemma  36,  the  machine  state  after  the  srmov  instruction  is  (informally  written): 
FA{rt:|c  a|}{^(x):|c  a\} 

And  by  lemma  34,  the  machine  state  after  the  junk  instruction  is  (also  informally  written): 
FA{rt:|c  a|}{^(x):|c  a|}{rt:ns32}  =  FA{hl(x):|c  a|} 

(since  FA(rt)  =  ns 32)- 

But  by  the  good  allocator  assumption: 

FA{hl(x):|c  a|}  =  F^ 

where  F^  =  |F,x:c  a|(^^°‘^^{re:  Exnptr(iTi)} 

Finally,  by  induction  and  heap  context  weakening: 

|4'|  h  F:4'i7 

I'I'I,  'hi?;  A,  F^  h  /  ok 

7.  Suppose  'F;  A;  F;  cji,  CJ2  he  let(/3,  j)  =  cine:T  F,  F  and  A  h  c  =  (ci,  C2) :  /ti  x  K2- 
By  inversion: 

4';  A,  ;F;^,fTi,fT2  he  e[ci,  02//?,  7]  F,F 
4';  A;F  h  e[ci,  02//?,  7]  :r  exp 

By  induction: 

|4'|  h  F-.'Iif 

|4'|,4'ir;  h  /  ok 

8.  Suppose 

'F;  A,  a:/ti  x  K2 ;  F ;  7l,  cii ,  cj2  he  let{P,  7)  =  c  ine  :  r  ref  ine(/3, 7)  =  |c|; 

F 

and  A,  a:Ki  x  K2  b  c  =  a  :  ki  x  K2- 
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^C'[{/3,7>/a]  e[(/3,7)/a]  :^[(/3,7)/a]  I]  F 


By  inversion: 

A,/3:ki,7:k2; 

r[(/?,7)/a]; 

A  0-1  [(/?,  7)/«] ,  0-2 [(/3, 7) /a] 

A,/3:Ki,7:K2,A';r[(/3,7)/a]  h  e[{P , 'j) / a]  :  T[{f3 , 'j) / a]  exp 

By  assumption: 

A,  a:Ki  X  h  (Ti :  S'T 
A,  a:Ki  X  h  (T2  :  ST 

So  by  substitution  : 

A,P:Ki,r-K2^’^^  ^  cri[\{l3,j)\/a]:ST 
A,P:Ki,r-H2^’^''  b  a2[\{P,'y)\/a]:ST 

So  by  induction  : 
l^'l  h  F-.'i/F 

1^'1,^'ir;  A,/3:«;i,7:K^^’^^;r5  h  /  ok 
where 

By  lemma  33: 

Ts  =  \T\T^^[\{P,j)\/a]=TA[\{P,7)\/a] 

So  by  the  pair  refinement  rule: 

I'I'I,  'hi?;  A,  a:«:i  x  a  b  ref  ine(/3, 7)  =  |c|;  ok 

/ 

9.  Suppose  'k;  A;  B;  7l,  cJi,  CJ2  be  let  fold /3  =  c  ine  :  r  F  and  A  h  c  =  f  old^y^  c' :  ^j-K. 
By  inversion: 

A;r;7l,cri,cri  be  e[c' / pi\  F,F 

'k;  A;  r  h  e[c' / 13]  :  r  exp 

By  induction: 

l^'l  h  F:^'e 

1^'1,^'ir;  A^i’/’2.r^  h  /  ok 
10.  Suppose 

'k;  A,  a:iJ,j.K,  A';  F;  7l,  fii,  <72  be  let  f  old/3  =  c  ine  :  r  ref  ine[f  old/3]c; 

F, 

F 

and  A,  a:fj,j.K,  A'  h  c  =  a  :  fij.K. 

By  inversion: 

^;A,P:K[nj.K/j],A'; 

T  [i  old  fj_j,i^f3/ a]] 

A  be[ford^7,/3H  e[foldf,j,^(3/a]  :  r[f old/^y^ /3/q;]  ^  F,F 

ai[iold^j,^P/a], 

fTi[fold/,yK/3/a] 

A,/3:«:[w.K/j],  A';r[fold^yK/3/a]  b  e[iold^j,^P/a]  :  r[f  old^y^ /3/q;]  exp 
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By  assumption: 

A,  a:iJ,j.K,  \- 

A,  a:^j.K,  A'^i’^2  \-  ■  gx 

So  by  substitution  : 

A,/3:K[fij.K/j],A'P^’P‘^  h  cji[|  fold^y^/^l/a]  -ST 
A,/3:K[nj.K/j],A'P^’P‘^  h  a2[\ioldf,j,^P\/a]  :  ST 

So  by  induction  : 
l^'l  \-  P-.'ilF 

F]  A,  f5:K[nj.K/j],  A'Pi’P^;Ts  h  /  ok 
where  Tg  =  |r[f  old^y^ /3/a] 

By  lemma  33: 

r5  =  |r|]4°'^^[|  fold^y«;/3|/a]  =  r^[|  fold^yK/3|/a] 

So  by  the  fold  refinement  rule: 

I'I'I,  'hi?;  A,  a:ixj.K,  A'p^’P2-^Ya  h  ref  ine[f  old /3]c;  ok 

I 


11.  Suppose  'k;  A;  T  he  let,-  inj^  /3  =  (c,  sv)  ine  :  r  I;F  and  A  h  c  =  ••■,««]  g  .  _|_ 

[k1  Ki,  ...  ,  Kn] . 

By  inversion: 

A;r;^,  fTi,cJ2  he  e[T  /  [5]  ■.T'^  I;F 
'k;  A;  r  h  e[cY/3]  :  r  exp 

By  induction: 

l^'l  h  F-.^f 

1^'1,^'ir;  A^i’/’2;rA  h  /  ok 

12.  Suppose 

'k;  A,  a:  +  [ki,  . . . ,  Kn],  A'  he  letT-injj/3  =  (c,  sv)  ine  :  r  ref  ine[inj ^  /3]|c|,  sv'] 

I; 

F 


and  A,  a:Ki  +  K2,  A'  h  c  =  a  :  ki  +  K2. 

By  inversion: 

'k;A,/3:«„A';  \ 

r[inj./3/a];  V  he[inj^/3/a]  e[inj  • /3/a]  :  r[inj  • /3/a] F 

hl,o-i[inji/3/a],(T2[injj/3/a]  J 

A,/3:Kj,A';r[iiijj/3/a];^,  cJi[injj/3/a],o-2[inj^  /3/a]  hga  s?;[inj^- /3/a]  :Void'^  sv' 
jGl...z  —  l,i  +  l...n 

^]  A,  f5:Ki,  A']  r[iiiji  /3/a]  h  e[inji  /3/a]  :  r[iiiji  /3/a]  exp 
'k;  A,  f5\Kj,A']  r[iiij^.  /3/a]  h  s?;[inj^-  /3/a]  :  Void 
jGl...z  —  l,i  +  l...n 

By  assumption: 

A,  a:  +  [ki,  . . . ,  k^],  A'^i’^2  'r  ai'.  ST 
A,  a:  +  [ki,  . . . ,  Kn],  A'^1’^2  |-  cr2  :  5T 
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So  by  substitution  : 

A,  /3:Ki,  A'P^'P‘2  I-  o-i[|  inj  ./^l/a]  :  ST 
A,l3:Ki,ATi^P^ha2[\±n2i/3\/a]:ST 
j  G  1 . . .  n 

So  by  induction  : 
l^'l  h  P-.'iiF 

where  =  |r[inj^ /3/a] 

By  lemma  33: 

inji/3|/a]  =  rA[|  inji/3|/a] 

By  theorem  18  (small  value  translation): 

I'I'I;  Ap-.Ki,  A'P'^'P'^-^Ty  b  sv' :  Void 
j  G  1...Z  —  l,i  +  l...n 
where  =  |r[inj  • 

So  by  the  sum  refinement  rule: 

I'I'I,  'hi?;  A,  a:  +  [ki,  . . . ,  k„] ,  A'^i ;  r a  b  ref  ine[inj ^ /3]|c|,  s?;';  ok 

I 

13.  Suppose  the  last  rule  used  was  the  case  translation  rule.  The  translation  of  a  case  statement 
produces  a  number  of  new  blocks  and  an  instruction  sequence.  By  lemma  30  it  suffices  to 
show  that  each  of  these  is  well-formed  when  the  heap-context  is  extended  with  the  types  of 
the  new  blocks.  Note  that  I  assume  all  labels  are  fresh.  I  leave  informal  the  inner  induction 
on  the  number  of  branches  n. 

Let 

£o:y[\A\,pi:ST,p2:ST].TA[rt-\ro\], 

h:y[\Alpi:ST,p2:ST].TA[rt-\n\], 


4:V[|A|,pi:5T,p2:5T].rA[rt:|r„|], 

4nd :  V[ I A I ,  PI :  5r,  p2 : 5r]  .r  A  [rt :  I  r  |] 


(a)  Heap  fragments. 

To  show: 

1^1  bF;  :^(F;Fi;...;F„) 

Fi; 


Fn 

By  inversion: 

'k,  A,  T,  Xi-Ti,  .A,  (Tl,  (72  bj,np£^^^[(niA),(Ti,cr2]  ^  *^0  “k)  /  G  1  ...  71 

A;r,Xi:ri  b  e* :  r  exp 
A;r,x:r;^,  cji,£72  \-c  e:Te F,  F 
di;  A;  r,  x:t  b  e  :  Te  exp 
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By  induction: 

1^1 

l^'l  h  F:^' 

So  by  lemma  30: 

1^1 

(b)  The  prelude  code: 

To  show: 

|di|,  d'i?;  /S.P^'P^]T A  b  niov ,  sv  '  ok 

brtagQ  Ft ,  4  [(Hi  A) ,  cJi ,  (T2] 

brt agfc_ ^  Ft ,  4- 1  [(Hi  A) ,  0-1 , 0-2] 
brtgdfc  Ft,  4 [(III A),  CTi,  0-2] 

t>rtag„_t  Ft,4-i[(niA),  fJi,  CJ2] 
mov  Ft ,  f  orgetunion  Ft 
jmp4[(niA),fTi,cJ2] 

By  inversion: 

A;r;^,  fTi,cJ2  I-32  sv  :  V[n,  •  •  •  ,Tn]  sv' 

Ah  n  =  Tag(i)  :  T32  i  €  1 . . .  (j  -  1) 

A  h  n  =  x[Tag(i),T'j  :T32  i€j...n 

A;r  h  s?; :  V['ro, 

By  theorem  18: 

1^1;  A/’i-/’2.r^  I-  s?;' :  I  V^o,  •  •  ■  ,Tk]\ 

So  by  the  mov  rule: 

1^1;  A/’i-/’2.r^  h  mov  rt,s4  ^  r2i{rt:|  V['^o,  •  •  •  ,'rfc]|} 

By  theorem  17  and  weakening: 

APi,P2  p  |^.|  =  Tag(i)  :  T32  i  G  1 . . .  (j  -  1) 

AP1-P2  p  It-^i  =  x[Tag(i),r(]  :T32  i  G  j  ...n 
By  assumption,  iih^[\A\,  pi:ST,  p2:ST].T A[rt-\Ti\], 

So  the  brtag  and  brtgd  instructions  are  well-formed  and  have  well-formed  targets. 
After  n  —  1  tag  comparisons.  Ft:  V[l'^»^l]-  (There  is  an  inner  induction  here  about  which 
I  am  being  informal). 

Therefore,  by  the  f  orgetunion  and  the  mov  rules: 

1^1;  A/’i-/’2.r^{rt:V[|Tn|]}  b  mov  Ft,  f  orgetunion  Ft  rA{rt:||rn||} 

So  the  final  jump  is  well-formed  as  well. 

(c)  ii,  for  i  G  0  . . .  n. 

To  show: 

1^'1,^'ir;  A^i’/’2.p^[t.^.|.^.|]  h  srmov  A(xi),rt  ok 

junk  Ft 

Ii 

By  inversion: 

'k.  A,  T,  Xi-Ti,  A,  (Tl,  (72  A),(Ti,cr2]  ^  *^0  -4  i  G  1  ...  71 

A;r,Xi:ri  b  e* :  r  exp 
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By  induction: 

1^'1,^'ir;  1- /.  ok  zel...n 

where  =  \T ,  Xi:Ti\'^°''^ 

By  the  good  allocator  assumption: 
ri(rt)  =  ns32  and 

r,  =  \T\Y^^{A{xiy.\Ti\}  =  rA{^(xi):|ri|} 

So  by  lemmas  36  and  34: 

I'I'I,  'hi?;  p  srmov^(xj),  ok  i  G  1 . . .  n 

junk  rt 

h 

(d)  4nd 

To  show: 

I'I'I,  'hi?;  \-  srmov  A{x),r^  ok 

junk  rt 
/ 

By  inversion: 

4';  A;r,x:r;^,  cJi,o-2  \-c  e:Te^  I]F 
'k;  A;  r,  x:t  h  e  :  Tg  exp 

By  induction: 

okwhere  T^;  =  |r, 

By  the  good  allocator  assumption: 
r2;(rt)  =  ns32  and 

Tx  =  |r|^i°"4^(x):|r|}  =  r^{^(x):|r|} 

So  by  lemmas  36  and  34: 

|'k|,  A^i’^2.  p  srmov ^(x),  rt  ok 

junk  rt 
/ 


14.  Suppose  the  last  rule  used  was  the  dyncase  translation  rule.  The  translation  of  an  exception 
case  statement  produces  a  number  of  new  blocks  and  an  instruction  sequence.  By  lemma  30 
it  suffices  to  show  that  each  of  these  is  well-formed  when  the  heap-context  is  extended  with 
the  types  of  the  new  blocks.  Note  that  I  assume  all  labels  are  fresh. 

Let 

^F  =  ^{F;Fi;Fn), 

4:V[|A|,pi:5r,p2:-S'r].r^[rt:  x  [Dyntag(|ri|),  |ri|]] 

4nd :  V[  I A I ,  PI :  5r,  p2 :  5r]  .r  A  [rt :  I  r.  I  ] 


(a)  Heap  fragments. 

To  show: 

1^1  h  F;  :^(F;Fi;F2) 

4; 

F2 
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By  inversion: 

^';A;r,xi:  x  [Dyntag(ri),  n];  ^  cji,  (72  ^jmp4nd[(niA), <71,^2]  ei :  h;  Fi 

^^5  ^5  *^5  f  ^2  ^ jmp 4nd[(ni  A),cri ,<72]  ^2  •  T~x  Ft  F2 

^■,A-,T,x:Tx]A,ai,a2  \-c  e:T  ^  r,F 
A;r,xi:  x  [Dyntag(ri),  n]  \-  ei'.Tx  exp 
A;r  h  e2:Tx  exp 
A;  r,  x:Tx  1“  e  :  r  exp 
By  induction: 
l^'l  h  Fi :  d'l 

I'&l  h  F2  :  'I'2 
l^'l  h  F:^' 

So  by  lemma  30: 

|d'|  h  F;Fi;F2:^'',d'i,4'2 

(b)  The  “otherwise”  case: 

To  show: 

I'I'I,  T^  h  mov ,  sv  '  ok 

brdyn  r^,  s?;i,4[(niA),fTi,cJ2] 

junk  rt 

h 

By  inversion: 

'k;  A;r;^,  (Ti,cr2  I-32  sv  :  Dyn'^  sv' 

'k;  A;  T  h  Sri :  Dyn 
By  theorem  18: 

|\k|;  /\Pl>P2.  \-  gy>  .  I  pyj2  I 

So  by  the  mov  rule: 

1^1;  A/’i-/’2;rA  b  mov  rt,s?;'  ^  r^{rt:|Dyn|} 

By  assumption: 

h-A[\A\,pi:ST,p2:ST].TA{rt:  x  [Dyntag(|ri|),  |ti|]} 

So  by  the  brdyn  rule: 

|d'|;  A^i’/’2;rA  b  brdyn  rt,  sr'^, 4[(niA),  ui,  0-2]  ^  rA{rt:||  Dyn  ||} 

And  by  lemma  34: 

|4'|;  A^i’/’2;rA  b  junk  rt  ^  rA{rt:||  Dyn  ||}{rt:ns32} 

Note  that  rA{rt:|  |  Dyn  |  |}{rt:ns32}  =rA. 

By  induction: 

|4'|,4'i7;  p  ok 

So  the  result  follows  directly  by  the  instruction  sequencing  rules. 

(c)  The  “equal”  case-£i: 

To  show: 

|4'|,4'i7;  x  [Dyntagdnl),  |ri|]}  b  srmov ^(xi),  rt  ok 

junk  rt 

h 

By  inversion: 

4';A;r,xi:  x  [Dyntag(ri),  n];  ^  cJi,  0-2  bj„,p4^^[(niA), <71,^2]  ei :  Tx h]  Fi 
4^;  A;r,xi:  x  [Dyntag(ri),  n]  b  ei :  4,  exp 


142 


By  induction: 

h  h  okwhere  Ti  =  \T , 

By  the  good  allocator  assumption: 
ri(rt)  =  ns 32  and 

Bi  =  X  [Dyntag(ri),Ti]|}  =  rA{^(a:i):  x  [Dyntag(|ri|),  |ri|]} 

So  it  suffices  to  show  that: 

1^'1,^'ir;  X  [Dyntagdnl),  |ri|]}  h  srmov  rt  ^  Bg 

junk  rt 

where  Bg  =  Ba{^(xi):  x  [Dyntag(|ri|),  |ri|]} 

Which  follows  by  lemmas  36  and  34. 

(d)  The  continuation  -  (-end'- 
To  show: 

I'I'I, 'hi?;  A^i’^2.  h  srmov  ^(x),rt  ok 

junk  rt 
I 

By  inversion: 

4^;  A;B,x:ra;;^,o-i,o-2  he  e:T  ^  r,F 
'B;  A;  B,  x:Tx  B  e  :  r  exp 
By  induction: 

okwhere  B^;  =  \T ,  x:Tx\^J^°^^ 

By  the  good  allocator  assumption: 

B2;(rt)  =  ns32  and 

Bx  =  |B|^i°'^2{^(x):|Ta;|}  =  BA{Bl(x):|rj;|} 

So  by  lemmas  36  and  34: 
jd^l,  d'i?;  A{^t'-\Tx\}  B  srmov^(x), rt  ok 

junk  rt 
I 


15.  Suppose 


di;  A;  B;  cji,  (T2  \~c  let  x  =  raise  sx  ine  :  r  ^  movri,  sv 

loadrrt,re(l) 
movsp,  rt 
loadr  rt,re(0) 
jmp  rt 


By  inversion: 

'B;  A;  B;  ui,  cr2  B32  sv  :  Dyn  sv' 
'B;  A;  B  B  sx  :  Dyn 

By  theorem  18: 

|\B|;  /\Pl.P2-  \-  gy'  ■  I  pyjt  I 

By  assumption: 

Ba(sp)  =  CJ/  O  CJl  O  fT2 

BA(re)  =  Exnptr((T2) 
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By  definition: 

Exnptr(cj2)  =  x  [Exnhndler(iT2),  <72] 

Exnhndler((T2)  =  V[].{ri:  Dyn,  sp:cj2}  ^0 

By  the  mov  rule: 

|\j/|j  /\Pi.P2.  p  gy!  ^  r^{ri:|  Dyn  |} 

By  the  load  rule: 

1^1;  A/’i-/’2;rA{ri:|  Dyn|  h  loadr  rt,  re(l)  ^  rA{ri:|  Dyn  |}{rt:o-2} 

By  the  stack  load  rule: 

1^1;  r^{ri:|  Dyn  |}{rt:cJ2}  h  movsp,  rt  ^  r^{ri:|  Dyn  |}{rt:cr2}{sp:cj2} 

By  the  load  rule: 

1^1;  A/’i-/’2;r^{ri:|  Dyn|}{rt:cJ2}{sp:fT2}  h  loadr  rt,  re(0)  ^ 

rA{ri:|  Dyn  |}{sp:cr2}{rt:  Exnhndlr(cj2)} 

By  the  jump  rule: 

|\j/|;  A/’i’P2.  Pyn  |}{sp:(j2}{rt:  Exnhndlr(iT2)}  h  jmprt  ok 

16.  Suppose  the  last  rule  used  was  the  handle  translation  rule.  The  translation  of  an  exception 
handler  produces  a  number  of  new  blocks  and  an  instruction  sequence.  By  lemma  30  it  suffices 
to  show  that  each  of  these  is  well-formed  when  the  heap-context  is  extended  with  the  types 
of  the  new  blocks.  Note  that  I  assume  all  labels  are  fresh. 

Let 

=  |r|^®{re:Exnptr(cr3) 

=  |r|^(sp) 

CJ3  =  Exnptr((T2)  >32  O'/  o  cji  o  <72 
(Th  =  o-f  o  0-3 
^F  =  ^{F;Fi;Fn), 

4andle:V[A,pi,/92]-{ri:Dyn,  sp:cJh, . . .} 

4ost:V[|A|,pi:5r,p2:5T].rs[rt:|r,|] 

4nd :  V[  I A I ,  Pi :  5r,  p2 :  5r]  .r  A  [rt :  I  r,  I  ] 

(a)  Heap  fragments. 

To  show: 

1^1  h  F;  :^(F;Fi;F2) 

F2 

By  inversion: 

^1  *^3  ^jmp£post[(ni  A),e,(T/i]  •  Tc  4,  Fi 

byn,  .A,  Cl,  C2  (jj]  ^2  •  T~x  4,  F2 

^■,A-,T,x:Tx-,A,ci,C2  \-c  e-.T  ^  F,F 
A;r  h  ei  :Tx  exp 
'k;  A;  T,  X2:  Dyn  h  62  :  exp 
'k;  A;  T,  x:Tx  L  e  :  r  exp 
By  induction: 
l^'l  h  Fi : 

I'k]  h  F2  :  'k2 

l^'l  h  F:^' 
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So  by  lemma  30: 
l^'l  h  F;Fi;F2:'i>',^i,^2 
(b)  The  handled  code  block: 

To  show: 

I'I'I,  d'i?;  T/i  h  push  rg  ok 

stackcopy  (Exnptr((T2))  >32  o'/ 
sfree  1 

malloc  re[Exnhndler(cj/i),  fj/i] (4andle[niA,  pi,  ^2],  sp) 

h 

By  assumption: 
rA(sp)  =  O'/  O  CJl  O  fT2 

rA(re)  =  Exnptr((T2) 

The  introductory  sequence  of  instructions  only  modifies  the  stack  and  the  exception 
register,  so  for  brevity  I  leave  the  rest  of  the  typing  context  implicit  and  only  describe 
the  types  of  the  stack  register  and  the  exception  register  after  the  instructions. 


Instruction 

sp  type 

Te  type 

Comment 

push 

CJ3  =  (Exnptr(cT2))  >32  u/  0  cji  0  a2 

Exnptr(£72) 

By  the  push  rule 

stackcopy 

(Exnptr((T2))  >32  (Tf  0  £73 

Exnptr(£72) 

By  lemma  41 

sfree 

O'/I  =  C7/  0  £73 

Exnptr(£72) 

By  the  sfree  rule 

malloc 

(Th 

Exnptr(£7/i) 

By  the  malloc  rule 

So  it  suffices  to  show  that: 

1^'1,^'ir;  A^i’/’2.p^{sp.^^}{rg;Exnptr(fT3)}  h  Ii  ok 
But  note  that  : 

|r|^"  =  |r|^i°"4sp:u,}  =  rA{sp:u,} 

So  the  desired  result  follows  by  induction. 

(c)  The  handler  block: 

Let  Th  =  {ri:Dyn,r2:ns32,rt:ns32,re:ns32,sp:cr/i,fi:ns64,f2:ns64} 
To  show: 

I'Ll,  Ti?;  b  srmov  ^(x2),  ri  ok 

sfree(frmsz(^)) 
pop  re 
junk  ri 

h 


Instr 

machine  state 

Comment 

srmov 

rH{^(x2):Dyn} 

By  lemma  36 

sfree 

Th{A{x2):  Dyn}{sp:£73} 

By  the  sfree  rule 

pop 

Th{A{x2)'-  Dyn}{sp:£7/  0  £7i  0  £72}{re:  Exnptr(£72)} 

By  the  pop  rule 

junk 

•  •  •  {ri:ns32} 

By  lemma  34 

But  note  that  by  assumption: 

=  |r|^^°'"4re:Exnptr(cr2)} 

=  {ri:ns32,  r2:ns32,  Yt.nsz2,  ^e-  Exnptr(fT2),  sp:fT/  o  (ji  o  £12,  fi:ns64,  f2:ns64} 
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And  by  definition: 

Tnisp-.af  o  cji  o  a2}{re:  Exnptr(cr2)}{ri:ns32} 

=  {ri:ns32,  r2:ns32,  rt:ns32,  re'.  Exnptr(fT2),  sp:fT/  o  cti  o  a2,  fi:ns64, 

=  Ta 

So  by  the  good  allocator  assumption: 
r//{^(x2):  Dyn}{sp:cr/  o  cji  o  a2}{re:  Exnptr((T2)} 

=  rA{^(x2):Dyn} 

=  |r,  X2-.  Dyn Exnptr((T2)} 

Finally,  by  induction: 

I'I'I,  'I'ir;  a{-A{x2)'-  Dyn}  h  I2  ok 

(d)  The  postlude  block: 

To  show: 

I'I'I, 'hi?;  p  sfree(frmsz(^))  ok 

pop  re 

jmp  4nd[niA,o-i,o-2] 

Note  that  T s(sp)  =  cj/i  =  <7/  o  a^. 

So  by  the  sfree  rule: 

1^'1,^'ir;  I-  sfree(frmsz(^))  ^  rB{rt:|T|}{sp:cr3} 

By  the  pop  rule: 

A^i’/’2.p^{r^.|^|}{sp.^3}  p  pop 

rs{rt:|r|}{sp:cr/  o  ui  o  cr2}{re:  Exnptr(cr2)} 

Note  that: 

rB{rt:|r|}{sp:cj/  o  cji  o  fT2}{re:  Exnptr(cj2)} 

=  |r|;4  °'^^{rt:|r|}{re:  Exnptr(cr2)} 

=  rA{rt:|T|} 

And  by  assumption: 

^end  :V[|A|,pi:5r,p2:5T].rA[rt:|r,|] 

So  by  the  jmp  rule: 

A^i’/’2;rA{rt:|r|}  h  jmp  4nd[niA,  ui,  CJ2]  ok 

(e)  The  continuation  block: 

To  show: 

I'I'I,  'hi?;  A^i’^2.  p  srmov  ^(x),  rt  ok 

junk  rt 

I 

By  lemma  36: 

1^'1,^'ir;  A^i’/’2;rA{rt:|ra:|}  h  srmov  ^(x),rt  ^  rA{rt:|r2,|}{^(x):|r2,|} 
By  lemma  34: 

|4'|,4'ir;  |- junk  rt  ^  rA{^(x):|ra,|} 

By  the  good  allocator  assumption: 

rA{^(x):|r2,|}  =  |r,xr3,|^^°'^^{re:  Exnptr(cr2)} 

By  inversion: 

4';  A;r,x:ra;;^,o-i,o-2  he-  e:r /;F 
'k;  A;  T,  x'.Tx  b  e  :  r  exp 
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So  by  induction: 

A^i’/’2.r^{^(x):|T,|}  h  /  ok 

17.  Suppose  'k;  A;  F;  cji,  CJ2  be  e  :  r  (S:!)]  F  via  the  spill  rule. 
By  inversion: 

'k;  A;r;^',(Ti,(72  be  e:r /;F 
where  A'  is  a  good  allocator  for  e 
'k;  A;  r  b  e  :  r  exp 

By  induction: 

I  dt|  b  F  '.'i/p 

1^'1,4'ir;  A^i’/’2;rs  b  I  okwhere  Bs  = 

By  inversion: 

1^1;  A/’l-Pl;  b  5  ^ 

So  by  lemma  29  (partial  instruction  sequence  completion): 

1^'1,4'ir;  A^i’/’2.r^  ^  S-j  ok 


8.3.6  Heap  values,  heaps,  and  programs 
Heap  values 

dt,  Q?!  .Kl ,  .  .  .  ,  Xl-Tl,  .  .  .  ,  Xfn-T~mj  ^1-^1  j  ■  ■  ■  j  A,  (T\ ,  (72  bret  6  .  T  /,  F 

Where  .4,  is  a  good  allocator  for  e 
and  ai  =  (cont(m  +  2  >i=  n)(pi)(p2)(|T|))  >32  0-32  o  ^64  o  pi 
and  (T32  =  In  I  >32  •  •  •  >32  \Tm\  >32  e 
and  CJ64  =  I  </>l  I  >64  •  •  •  >64  1 4>rL  \  >64  C 
and  a2  =  P2 
and  I  =  frmsz(^) 

and  hval  =  code[ai:Ki, . . .  ,ak:Kk,  pv-ST,  p2:ST]\  Code(ri,  •  •  .,Tm)i4>i,  ■  ■ .  ,(/>n)('r)| 
salloc  1] 

srmov  ^(xi), sp(/ +  0); 


srmov  A{xm),  sp(/  +  m  —  1); 
srfmov  ^(n),  sp(/ +  m  +  2  *  0); 


srfmov  sp(Z  +  m  +  2  *  (n  —  1)); 

/ 

4'  hh  code^[ai:Ki, . . . ,  afc:Kfc](xi:ri,  •  • . ,  Xm:rm)(n:</>1,  •  •  • ,  Zn.(j)n)-e: 

V[ai:Ki, . .  .,ak:Kk]  Code(Ti,  •  •  ■  ,Tm)i(pi,  ■  ■  ■An){T)  hval]  F 

Since  the  only  heap  values  in  the  LIL  are  code  blocks,  the  heap  value  translation  has  only  one 
inference  rule,  for  translating  code  functions.  Code  functions  are  translated  by  translating  their 
bodies  with  a  good  allocator,  and  with  appropriate  stack  segment  parameters.  Recall  that  the  stack 
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segment  parameters  describe  the  stack  below  the  current  frame  (which  is  managed  by  the  allocator). 
Consequently,  the  stack  segments  in  the  code  function  translation  reflect  the  calling  convention  on 
entry  to  the  function:  the  top  stack  segment  consists  of  an  appropriate  return  continuation  and  the 
32  and  64  bit  arguments  pushed  onto  the  first  stack  segment  type  variable  {pi).  The  second  stack 
segment,  describing  the  section  below  the  current  handler,  is  simply  the  second  stack  type  variable 
{p^).  Notice  that  the  parameter  of  occurrence  on  the  expression  translation  derivation  indicates 
that  the  expression  occurs  in  a  return  context  and  hence  should  return  its  value  by  popping  the 
frame  and  calling  the  return  continuation  from  the  stack. 

The  result  of  translating  the  body  is  an  instruction  sequence  I  and  a  heap  fragment  F  containing 
additional  code  blocks  to  be  allocated  at  the  top  level.  In  order  to  complete  the  translation,  it 
is  necessary  to  wrap  the  instruction  sequence  I  with  additional  code  to  allocate  the  frame  and 
to  initialize  the  locations  assigned  to  the  argument  variables  from  the  locations  assigned  to  the 
in-arguments  by  the  calling  convention.  In  principle  the  allocator  may  sometimes  render  this 
initialization  code  unnecessary  by  using  variables  directly  from  their  in- argument  slots,  but  for 
simplicity  I  do  not  attempt  to  perform  this  optimization  here.  The  wrapped  code  sequence  is  then 
abstracted  with  respect  to  the  free  type  variables  (including  the  stack  segment  parameters)  to  turn 
it  into  a  closed  heap  value. 


Theorem  23  (Soundness  of  the  heap  value  translation) 
If  'k  h  hval :  r  hval 

and  'k  \-fi  hval :  r  ^  hvahF 

then 

|4'|  \-  F-.'f/p 

and  I'kl  h  hval :  |r|  hval 


Proof:  By  construction. 

By  inversion: 

^  is  a  good  allocator  for  e 

. . . ,  zp.cpi, 

^]ai:Ki, . . .  ,ak:Kk-,xi:Ti  ,  .  .  .  ,  XfYi.Tm,') 

And  by  theorem  15  and  construction: 
h  0-1 :5T 
h  o-2:5'T 

So  by  theorem  22: 

|4'|  h  F-.'^f 

h  /  ok 

where  Ta  =  |xi:ri, . . . ,  Xm-Tm,  zi.(t>i,  ■■■,  Zn.<j)n\A°''‘^{^e.  Exnptr(fT2)} 
It  remains  to  be  shown  that: 

|4'|  h  hval :  |V[ai:«:i, . . .  ,afc:Kfc]  Code(ri, . .  .,Tm){(pi,  ■  ■  .,(j)n){T)\  hval 


. . .  ,Zn.(j)n  I-  e:r  exp 
.  .  .  ,  Zn-fpm  “T,  (Ji,  <T2  b]-0t  e  .  T  /,  F 
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By  lemma  29,  it  suffices  to  show  that: 

|\J/|;  A^l’^2.  Pq  |-  ga]_]_QC  /j 

srmov  ^(xi), sp(/ +  0); 


srmov  A(xm),  sp(/  +  m  —  1); 
srfmov  ^(zi),  sp(/ +  m  +  2  *  0); 


srfmov  sp(^  +  m  +  2  *  (n  —  1)); 

where  Tq  =  {ri:ns32,r2:ns32,fi:ns64„f2-ns64,re:Exnptr(a2),r^:ns32,sp:(7i} 
and  Ta  =  ,  Xm-Tm,  zi'Ai, . . . ,  Zn:4>n\A°''‘^Ue-Exnptr{a2)} 

By  the  salloc  rule,  it  suffices  to  show  that  : 

I'I'I;  |- srj^ov  ^(xi),  sp(/ +  0);  ^ 

srmov  A{xm),  sp(^  +  m  —  1); 
srfmov  ^(zi),  sp(/ +  m  +  2  *  0); 

srfmov  sp(/  +  m  +  2  *  (n  —  1)); 

where  Bi  =  {ri:ns32,  r2:ns32,  fi:ns64,  f2:ns64,  rg:  Exnptr(cr2),  rt:ns32,  sp:ns32'  >32  cri} 
Which  follows  by  induction  on  n  +  m 

•  Suppose  n  +  m  =  0 
Then  by  assumption: 

=  I  •  l5^°'^Mre:Exnptr(fT2)} 

And  by  the  good  allocator  assumption: 

I  •  l5‘°‘^Mre:Exnptr((T2)} 

=  {ri:ns32,  r2:ns32,  fi:ns64,  f2:ns64,  fe:  Exnptr(cr2),  rt:ns32,  sp:ns32'  >32  cri} 

•  Suppose  m  >  0.  Then 
By  induction: 

|T|;  p  srj^ov  ^(xi),  sp(Z  +  0);  ^  B'^ 

srmov  ^(xm), sp(/ +  m  —  1); 
srfmov  ^(zi),  sp(l  +  m  +  2  *  0); 

srfmov  A{zn-i),sp{l  +  m  +  2  *  {n  —  2)); 
where  B'^  =  jxpri, . . .  ,Xm-Tm,  zp^i, . . . ,  Exnptr(fT2)} 

So  by  the  definition  of  partial  instruction  sequences,  it  suffices  to  show  that: 

|\|/|;  A/’i’/’2;  p'^  h  srfmov  ^(z„),  sp(/  +  m  +  2  *  (n  —  1))  ^  B^ 

But  note  that  by  lemma  37: 

|vl/|;  A/’1-P2.  p'^  h  srfmov  A{zn),sp{l  +  m  +  2*  (n-l))  ^  T'^{A{zn)-\4>n\ 
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And  by  the  good  allocator  assumption: 

Ta  =  \xi-.Ti, . . .  Zn-i {fe:  Exnptr ((72) } 

=  |xi:ri,  .  .  .  ,  Xm-Tm,  •  •  •  ,  Zn-i:(j)n-lQ°''^  Ue- Exnptr  ((T2)  }{^(2:n) :  |  </>n  | } 

=  T'^{A{Zn):\<Pn\} 

•  Suppose  m  =  0  and  n  >  0.  Then 

By  induction: 

|\j/|;  A/’i>P2.  |- gj^j^Qv  A(xi),  sp(Z  +  0);  ^ 

srmov  A(xm-i), sp(/ +  m  —  2); 

where  T'^  =  . . . ,  x^-TTm-i,  Exnptr(cj2)} 

So  by  the  definition  of  partial  instruction  sequences,  it  suffices  to  show  that: 

|\|/|;  h  srmov  A(xm),  sp(/  +  m)  ^  T^ 

But  note  that  by  lemma  36: 

|vj/|;  A/’1’P2.  I- srmov  A(xm),sp(/ +  m)  ^  r'^{A(xm):|rm| 

And  by  the  good  allocator  assumption: 

=  \xi:ti,  . . .  ,Xm-l-Tm-l,Xm-Tm\‘^°'"^{re:Exnptr{a2)} 

=  |xi:ri, .  .  .  ,  Xm-l-Tm-l\'^°''‘^{re-  Exnptr(fT2)}{A(Xm):|Tm|} 

—  r^{  A(Xm)  •  |Xm| } 


Heaps 


'I'[£:r]  h/i  hval :  r  hval']  F  d  F' 

'h  h  e  e  'h[£:r]  h  d,t-T  ^  hval  £:\T\.hval',  F;  F' 

The  translation  of  heaps  simply  translates  the  individual  heap  bindings  to  produce  a  heap 
value  and  a  new  heap  fragment.  The  heap  value  can  then  be  bound  to  the  assigned  label,  and 
incorporated  with  the  new  heap  fragment  into  the  rest  of  the  re-written  heap. 

The  heap  soundness  theorem  is  stated  in  two  parts.  Since  the  translation  returns  a  heap 
fragment  for  each  heap  value,  I  first  show  that  the  result  of  the  translation  is  well-formed  as  a  heap 
fragment.  The  well-formedness  of  the  translation  of  a  closed  well- formed  heap  then  follows  almost 
immediately. 

Theorem  24  (Soundness  of  the  heap  translation) 

If  'h  h  d  ok 

and  d  F 

then 

I dt]  h  E  ok'k p 


Proof:  (By  induction  on  heaps) 
•  Suppose  d  =  e. 
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By  definition: 

'I'  h  e  e 

And  by  the  heap  fragment  typing  rule: 
1^1  h  e:e 

•  Suppose  d  =  d',£:T  ^  hval 

By  inversion: 

'h[£:r]  h  d'  ok 
'h[£:r]  h  hval :  r  hval 
^[£:r]  hd^F' 

'h[£:r]  h/i  hval :  r  hval']  F 

By  induction: 

|^[£:r]|  h  F'  ok^'^ 

By  theorem  23: 

|'I'[t':r]  I  h  hval' :  |r| 

|^'[£:r]|  h  F  ok^F 

By  lemma  30: 

|^[£:r]|  h  F]F'  ok^F,^F 

By  the  heap  fragment  formation  rule: 
mtT]  \  h  t\T\.hval',F]F'  ok^F,^'F 


Programs 


^ (d)  d  ^  H'  •;  •;  A,  e,  e  hhait  e  :  t  I' ]  F 

Where  A  is  a  good  allocator  for  e 
and  H  =  F]H', 

^ihandle:  Exnhandler(e) 
mov  rt,ri; 
halt^Dyn 

and  R  =  {ri  ns32,  r2  ns,  ns32,  vt  ^  ns,  fi  ns64,  f2  ns64,  ft  ns64,  sp  e} 
and  I  =  malloc  re[Exnhndler(e),  e](fihandle,  sp);  I' 

•  H  letrec  d  in  e  :  r  H,R,I 

The  translation  of  a  LIL  program  is  obtained  by  translating  heap  to  obtain  a  TILTAL  heap 
{H'),  and  translating  the  body  of  the  program  to  obtain  an  instruction  sequence  {!')  and  some 
additional  heap  blocks  to  be  incorporated  into  the  heap  F).  Note  that  the  body  of  the  program 
is  translated  with  a  halt  occurrence  parameter,  indicating  that  the  instruction  sequence  should  be 
terminated  with  a  halt  instruction.  In  order  to  satisfy  the  calling  conventions,  it  is  necessary  to 
add  a  final  exception  handler  block  to  the  heap  (^ihandie)  and  allocate  an  appropriate  exception 
frame  containing  it  in  rg.  The  initial  register  file  contains  nonsense  in  all  of  the  general  purpose 
registers,  and  an  empty  stack  in  the  stack  register. 
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Theorem  25  (Soundness  of  the  program  translation) 
If  •  h  letrec  dine :  r 

and  •  H  letrec  d  in  e :  r 

then 

•  h  ok 


Proof: 

By  inversion: 

^(d)  h  d  ok 
^(d) 

So  by  theorem  24: 

|^(d)|  h  H'  ok^'^ 

By  inversion: 

4'(d);*;*;^,  e,e  Kait  e\T-^  I'-,F 
'k(d);  •  h  e  :  r  exp 

And  by  the  empty  stack  rule: 

AP1.P2  h 

So  by  theorem  22: 

|^(d)|  ^  F-.^f 

|'k(d)|,'ki?;  A^i’^2.pp|j.g.Exnptr(e)}  h  F  ok 
where  Bq  =  |  •  |^ 

By  construction: 

fihandle^  Exnhandler(e)  h  fihandle^  Exnhandler(e)  ok 

mov  rt,ri; 
halt^ya 

And  hence: 

^handle:  Exnhandler (e) ,  f  b  F;  H' ,  ok 

^ihandle:  Exnhandler(e) 
mov  rt,ri; 

By  the  good  allocator  assumption: 

I  •  |^{re:Exnptr(e)} 

=  {ri  :ns32,  V2'-ns,  Vg:  Exnptr(e),  rt:ns,  fi:ns64,  f2-nse4,  ft:ns64,  sp:e} 

By  the  malloc  rule: 

Ahandle:Exnhndler;  A;ro  h  malloc  re[Exnhndler(e),  e]  (^handle,  sp)  ^  ro{re:  Exnptr(e)} 

So  by  the  instruction  rule: 

|4'(d)|,4'(F),fihandle:Exnhndler;  A;ro  b  /  ok 

So  by  the  program  rule  : 
h  (H,  R,  I)  ok 
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A;r;^,  fTi,cJ2  I-32  i :  Int  i 


Ahc=V[---,Ci,...]:T32 
'I';  A;r;^,  (Ti,(T2  I“32  sv.Ci^  sv' 

'I';  A;  F;  cTi,  £72  l“32  inj_union^  sv.c^  inj_unioii|^l  sv' 

A  h  r  =  Rec[K](c)(cp)  :T32 
'F;  A;  F;  cji,  cJ2  F32  sv  :  c{Kec[n]c)cp  ^  sv' 

'F;  A;  F;  iTi,  £72  F32  roll,-  sv  :t  ^  rollj^l  sv' 

A  h  r  =  Rec[K](c)(cp)  :T32 
'F;  A;F;^,  £7i,£72  F32  sv.t^  sv' 

'F;  A;  F;  £71,  £72  F32  unroll,-  sv  :  c(Rec[K]c)cp  unroll|,-|  sv' 

Ah  t  =  3M(c')  :T32 
'F;  A;  F;  £7i ,  £72  h32  sv  :c'c^  sv' 

'F;  A;  F;  £7i,  £72  h32  pack  sv  as  r  hiding  c:t 

(pack[|r|]|c|)s?;' 


'F;  A;F;^,  £7i,£72  h32  sv  :V[k](c')  sv' 
'F;  A;F;^,  £7i,£72  h32  s?;[c]  :c'c^  s^llcl] 
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Float  values 


A;  F;  ^,0-1, 0-2  fv  :  (p  fv' 


A;r[x/:(/)];^[x/  -^f],ai,a2  xf.  <j) f 


A;r[x/:(/)];^[x/ ^  sp(i)],cr2,fT2  I-32  x f.  <j) sp{i) 


'F;  A;r;^,  (T2,(T2  I-32  r:  Float  ^  r 


64  bit  Instructions 


'I';  A;  F;  cji ,  (72  F  dest32  ^  fopr  ■.(p'^  S 


4^;  A;  F;  ^,0-1, 0-2  F  ft  ^  fopr:(p-^  S 

4';  A;  F;  cji,  (72  F  sp(i)  ^  fopr  :(p^  S] 

fswrite  sp(i),ft; 
fjunk  ft 


4';  A;  F;  ^,0-1, 0-2  ^32  fv  :  (p  fv' 

4^;  A;  F;  i7i,  72  F  f  ^  fv  :(p'^  fmov  {t,fv' 

4^;  A;  F;^,  71,72  F32  svi :  Array g4((/))  sv[  4^;  A;  F;  7i,  72  F32  SV2  :  Int  SV2 

4';  A;  F;^,  71,72  F  f  ^  s\ib^{svi,  SV2)  :  ^  subj^l  f,  s?;t,st’2 


4^;  A;  F;  7i,  72  F32  sv  :  Boxed(0)  sv' 


4^;  A;  F;  7i,  72  F  f  ^  unbox  sv  :(p'^  f  loadr  f,  sv' 


Instructions 


4^;  A;  F;  7i ,  72  F  dest32  <—  opr  it'^S 


4^;  A;  F;  7i,  72  F  rt  <—  opr  :t'^S 


4^;  A;  F;  71, 72  F  sp(i)  ^  opr  :t  S; 

swrite  sp(i),rt; 
junk  rt 


4^;  A;  F;^,  71,72  F32  sr  :  r  ^  sv' 

4^;  A;  F;  7i,  72  F  r  <—  sr  :  r  movr,  sv' 
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'I';  A;  ai,a2  I-32  sv:  n  x  ...  Tn  sv' 


'I';  A;  F;  A,  ci,  (T2  F  r  ^  select*  sv.Ti'^  movr,  sv' 

loadr  r,  r(i) 


A;  T;  0-1,  CJ2  h  r  ^  dyntag^  :  Dyntag(c)  dyntag|^l  r 

A;r;^,cri,cr2  l-e4 /y  :  Float /y' 

'F;  A;  F;  (Ti,  CJ2  F  r  ^  boxfv  :  Boxed((/>)  malloc|^l  r,fv' 


A;F;^,cri,cr2  F32  svi  :Ti  st' 


A;F;^,cri,fT2  F  r  ^  . . 


sVn):Ti  X  ...  X  Tn^malloc  r,  [|ri|,...,|rn|](st'i,...,st'„) 


|r|^(sp)  =  {ri:ns32,r2:ns32,re:ns32,ri:ns32,fi:ns64,f2:ns64,sp  :  a/} 
AP1-P2  \-  af  =  ns  32  >32- ■■  >32  ns32  >32  nse4  >64  •  •  •  nse4  >64t' :  ST 

' - V - '  ' - V - ' 

m  k 

'F;  A;  F;  fJi ,  (72  F32  St :  Code(ro  )  •  •  •  )  Tm—  l){(j)0,  .  .  sv' 

'F;  A;  F;  CJI ,  CJ2  F32  St  j :  Tj  St  •  i  G  0  . . .  m  -  1 

4';  A;F;^,  cJi,o-2  F32  /tj :  0*  /t  •  i  G  0  . . . /c  -  1 

A;F;^,  cJi,o-2  F  r  ^  call  st(sto, . . . ,  stm-i)(/to,  •  •  •  Jvk-i) 

fswrite  sp(m  +  2*  {k  — 

fswrite  sp(m),/tg 
swrite  sp(m  —  1),  st^_]^ 

swrite  sp(0),stQ 

junkregs 

call  sv'[a'  o  ai,  <72] 

movj  r,  rt 


'F;  A;F;  cji,(T2  F32  sti :  Int  st'^  'F;  A;  F;^,  cri,  cr2  F32  st2  :  r  sv'2 

'F;  A;  F;  cji,  CJ2  F  r  ^  array .^(sti,  SV2)  ■  Array32(T)  malloc|.^|  r,  st'^,  sv'2 

'F;  A;  F;  (Ti,  (72  F32  sv  :  Int  sv'  'F;  A;  F;  ai,a2  \-64  fv  :(j)^  fv' 

'F;  A;  F;  C7i,  (72  F  r  ^  f  array(st,/t) :  Farray((/))  fnialloc|0|  r,  sv' ,fv' 
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^;A;T;A,ai,a2  I-32  svi :  Array32(r)  ^  sv[  'I';  A; T;  cji, <72  I-32  SV2  :  Int  SV2 


'I';  A;  F;  A,  ui,  1T2  F  r  <—  subT-(s?;i,  SV2)  ■  t  ^  subj^l  r,  sv'i,  sv'2 

A;r;^,  fTi,cJ2  I-32  svi  :Array32(r)  ^  sv'i 
'F;  A;r;^,  cji,(T2  I-32  s?;2  :  Int sv'2  T;  cJi,  cr2  I-32  sv^-.t^  sv'^ 

'F;  A;  F;  A,  ui,  cJ2  F  r  ^  upd^(s?;i,  SV2,  SV3) :  Unit  upd  sv'^,  sv'2, 

malloc  r,  []() 


A;F;^,  fTi,cJ2  F32  s?;i :  Arrayg4((/))  ^  sv'^ 

'F;  A;  F;  (Ti,  iT2  F32  SV2  :  Int  sv'2  T;  cJi,  CJ2  \-64  fv  fv' 

'F;  A;  F;  cji,  CJ2  F  r  ^  upd0(s?;i,  sv2,fv) :  Unit  upds?;^,  sv'2,fv' 

malloc  r,  []() 


Expressions 


'F;  A;  F;  (Ti,  CJ2  Fc  e  :  r  r,H 


'F;  A;F;^',cji,(T2  Fc  6:7^^  /;F 
1^1;  A^i’/’i;  p  5  ^  \TQr^ 

where  A'  is  a  good  allocator  for  e 


4';  A;F;^,fTi,fT2  Fc  e:r (5:/);F 


APi,P2  \-  (ji  =  (cont(m)((74)((T2)(|'rr|))  >32  a'^:ST 
'F;  A;  F;  A,  ui,  1T2  F32  sv  :Tr  ^  sv'  frmsz(^)  =  n 

'F;  A;  F;  fji,  CJ2  F^-et  sv  :Tr  mov  rt,  sv'] 

sfree  ri] 
junkregs; 
junkstack  1 . . .  m; 
ret 


|d/|;  A^i’^F  F  r{rt:|r|}  ^  0 

'F;  A;F;^,  cri,cr2  F32  sv.t^  sv' 

'^]A]T]A,ai,a2  sv.T-^raov  y^,sv' 

jmp  svi 


4';  A;F;^,  e,  eF32  sv.Tr  ^  sv'  frmsz(^)  =  n 


'F;  A;  F;  e  F^ait  sv  :Tr  ^  mov  r^ ,  sv' 

sfree  n 
haltr^ 
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'i/]  A;T]  A,ai,a2  1“  ^(x)  <—  opr  :Ti'^  S 
'if,  A-,T,x:Ti-,A,ai,a2  he  e:r /;F 

'h;  A;  F;  fji,  cj2  he  let^  x  =  opr  ine  :  r  (5;  /);  F 


A;  F;  cji ,  (T2  h  A{xf)  ^  S 

^■,A-,T,Xf:4'i]A,ai,a2  he  e:T  r,F 

A;F;^,  cJi,o-2  he  let^  Xf  =  fopr  in  e  :  r  ^  {s-,i);F 


'F;  A;F;^,  o-i,e2  hga  sv  :3[k][c]  ^  sv' 

A,  a:K]  F,  x:{ca)]  A,  cxi,  <72  he  e:T  ^  I;F 

'F;  A;  F;  ^  he  let  [a,  x]  =  unpack  s?;  ine  :  r  unpack  [a,  rt]st'; 

srmov  A{x),r^, 
junk  rt; 

F 


A  h  C  =  (ci,C2)  :  Kl  X  K2 

A,  ;F;^,cri,cr2  he  e[ci,  C2//3, 7]  :  r[ci,  02//?,  7]  F,  F 


A;F;7l,fTi,fT2  he  let{P,'y)  =  cin  e  :  r  F,F 


Ah  c  =  a:  Ki  X  K2 
A,/3:ki,7:k2;  1 

r[(/?,7)/«];  >  ^c[{p,'r)/a]e[{PA)/oi]:T[{P,j)/a]'^  F,F 

Acri[{P,j)/a],a2[{(5,f)/a]  ) 


'F;  A,  a:Ki  x  «;2;  F;  7l,  fii,  cj2  he  let{P,  7)  =  c  ine  :  r  ref  ine(/3, 7)  =  |c| 

/; 

F 


Ah  c  =  f  c' :  fj.K 

A;T;A,ai,ai  he  e[c! / (3]  I;F 

'F;  A;  F;  7l,  ui,  1T2  he  let  fold  /3  =  cine:r^/;F 


A,/3:K[^j.K/j],  A'; 
F[fold^j.K/3/a]; 

hi, 

fji  [told^j,^,l5/a\, 
fji  [fold/,j.K/3/a] 


A,  a-.p.j.K,  A'  h  c  =  a  :  p,j.K 

'  he[foid^^,,/3H  e[fold^j.«/3/a]  :r[fold/,j,«/3/a]  ^  F,F 


'F;  A,  a:nj.K,  A';  F;  7l,  cii,  <T2  he  let  fold /3  =  c  in  e  :  r  ref  ine[f old /3]c; 

I; 

F 
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A\-  c=  ^  ^ 

A;r;^,  cJi,o-2  he  e[c7,5]  '.t^  I;F 


'h;  A;  r  he  letr  inj^  /?  =  (c,  s?;)  ine  :  r  F 


A,  a:  +  [ki  . . . ,  Kn] ,  A'  h  c  =  a  :  +[«!...«„]  /3  ^  A,  A' 

'I/;A,/3:ACi,A';  'I 

r[inji/3/a];  >  he[inj,/3/a]  e[inj  • /3/a]  :  T[inj  • /3/a] F 

hl,o-i[iiijj/3/a],cr2[inji/3/a]  J 

A,/3:Kj,  A';r[inj^  /3/a];^,o-i[iiij^. /3/a],fT2[inj^  /3/a]  hga  s?;[injj /3/a]  :Void'^  sF 

j  G  1 . .  .i  —  +  1 . .  .n 

'h;  A,  a:  +  [ki,  . . . ,  Kn],  A'  he  letr  inj^  /3  =  (c,  s?;)  ine  :  r  ^  ref  ine[injj  /3]jcj,  sr'; 

F 


'h;  A;  F;  cji,  CJ2  hgj  sr  :  Dyn  sv' 

'h;  A;  F;  fji,  cJ2  he  let  x  =  raise,-  sr  ine  :  r  movri,  sv 

loadr  rt,re(l) 
movsp, 
loadr  rt,re(0) 
jmprt 
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A;T;A,ai,a2  I-32  sv  :  Dyn'^  sv'  'I';  A; T; cji, CJ2  I-32  svi :  Dyiitag(ri)  ^  sv' 

^';A;r,xi:  x  [Dyntag(ri),  n];  cji,  0-2  hj^p4^^[(niA),fTi,<72]  ei :  /i;  Fi 

*^5  5  ^2  ^  jmp  ^end  [(ni  A),cri,(72]  ^2  •  '^x  ^2 ;  F2 

A;r,x:ra;;^,  cJi,o-2  'rc  e:T  I]  F 

\Y\<noa2  ^ 

'I';  A;  F;  cji,  cJ2  Fc  let  x  =  dyncase(sx)(s?;i  ^  xi.ei,  _  ^  62)  ine  :  r 

movr^,  sv' 

brdyn  r^,  s?;i,£i[(niA),cJi,o-2] 

junk  rt 

h 

ei:\/[\A\,  pr.ST,  p2:ST].r A[rt-  x  [Dyntag(|ri|),  |ri|]] 
srmov  ^(xi), 
junk  rt 

h 

^end‘  V[|A|,pi:5r,p2:5r].rA[rt:|T3:|] 
srmov  ^(x),rt 
junk  rt 

F; 

Fv, 

F2 
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r,  J[,  e,  CJ3  l“jmp£post[(ni  A),e,cr^]  ■  Tx  III  Fl 

^5  ^2‘  ^1 ;  ^2  ^  jmp  ^end  [(1^1  A)  ,crj'0(Ti  ,cr2]  ^2  •  ^0^  -^2;  -^2 

A;r,x:ra;;^,  cJi,o-2  l-c  e:r /;F 

Ta  =  {ri:ns32,  r2:ns32,  rt:ns32,  rg:  Exnptr((T2),  spia/  o  cji  o  a2,  fi:ns64,  f2:ns64} 
where  =  |r|^^°'^^{re:  Exnptr((T2) 

and  =  |r|^®{re:  Exnptr(cj3) 

and  af  =  |r  l!4(sp) 
and  CJ3  =  Exnptr((T2)  >32  a/  o  cji  o  <72 
and  ah  =  af  o  <73 

'h;  A;  F;  C7i,  £72  Fc  let  x  =  handleT-^(ei,  X2.e2)  ine  :  r  ^ 

push  Te 

stackcopy  (Exnptr(c72))  >32  (Jf 
sfree  1 

malloc  re[Exnhndler(cj,j),  £7/i] (4andle[niA,  pi,  ^2],  sp) 

h 

^handle  :V[A,  pi,  p2]-{ri:  Dyn,  r2:ns32,  ri:ns32,  re:ns32,  sp:ah,  fi:ns64,  h'-nsQi] 
srmov  ^(x2),ri 
sfree  (frmsz  (^) ) 
pop  Ye 
junk  ri 

h 

4ost:V[|A|,pi:5T,p2:5T].FB[rt:|r,|] 
sfree  (frmsz  (^) ) 
pop  Ye 

jmp  4nd[niA,£7i,£72] 

4nd:V[|A|,pi:5T,p2:5T].FA[rt:|r.|] 
srmov  ^(x),rt 
junk  Ft 

F; 

Fi; 

F2 
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A;r;^,  fTi,fT2  I-32  sv  :  V['ri,  •  •  • ,  t„]  ^  sv' 

A  h  Tj  =  Tag(i) :  T32  i  G  1 . . .  (j  -  1)  A  h  r*  =  x  [Tag(i),  r']  :  T32  iGj...n 

A,  r,  Xi.Ti^  CTl ,  (72  ^  jmp  £entl[(ni  A),cri  ,<72]  ^  ^  \  ...  Tl 

A;  r,  x:t]  A,  ai,a2  he  e  :  Te I;  F 

iri^io.a  ^ 

'I';  A;  F;  cji,  cJ2  he  let  x  =  case^(s?;)(xo.eo,  •  •  • ,  Xn.Sn)  ine  :  Tg 

movr^,  sv' 

brt ago  rt ,  4  [(Hi  A) ,  cji ,  a2] 


hrtaLgi^_^  rt ,  4-i [(Hi A) ,  ai ,  0-2] 
brtgd;;.  rt,  4[(niA),  fJi,  CJ2] 


brt ag„_ t  rt ,  4- 1  [(Hi  A) ,  ai ,  £72] 

mov  rt ,  f  orgetunion  rt 
jmp4[(niA),f7i,cr2] 
io:y[\Alpi:ST,p2:ST].rA[rt:\To\] 
srmov  A{xo),  rt 
junk  rt 

Iq 

h:y[\Alpi:ST,p2:ST].TA[rt:\ri\] 


inh/[\^\,Pi:ST,p2:ST].rA[rt:\Tn\] 
srmov  A{xn),Yt 
junk  rt 
In 

4nd:V[|A|,pi:5r,p2:5r].rA[rt:|r|] 
srmov  ^(x),rt 
junk  rt 

I; 

F; 

Fi; 


Fn 


161 


Heap  values 


'I'  h/j  hval :  r  hval]  H 


'i>]  ai'.Ki, . . . ,  ak'.Kk]  xi'.Ti, . . 

•  1  ■2^1  •‘^^1?  •  •  •  ;  ^n‘4^ni  ^2  ret  ^  1 1 

Where  ^  is  a  good  allocator  for  e 
and  ai  =  (cont(m  +  2*  n){pi){p2){\T\))  >32  0-32  o  ae4  o  pi 
and  (T32  =  |ti|  >32  •  •  •  >32  kml  >32  e 
and  CJ64  =  I  </>l  I  >64  •  •  •  >64  1 |  >64  C 
and  (T2  =  P2 
and  I  =  frmsz(^) 

and  hval  =  code[ai:Ki, . . .  ,ak:Kk,  pv-ST,  p2:ST]\  Code(ri, . . .  ,rm)(</>i,  •  • .  ,(/>n)(r)| 
salloc  1] 

srmov  ^(xi), sp(/ +  0); 


srmov  A{xm),  sp(/  +  m  —  1); 
srfmov  ^(zi),  sp(/ +  m  +  2  *  0); 


srfmov  sp(Z  +  m  +  2  *  (n  —  1)); 

/ 

\-h  code^[ai:Ki, . . . ,  afc:Kfc](xi:ri, . . .  ,Xm-Tm){zi:4>i, . . . ,  ZnAn) -e  ■. 

V[ai:Ki, . .  .,ak-.Kk]  Code(Ti, . . .  ,rm)((/)i, . . .  ,(/>n)(T)  hval]  F 


Heaps 


h  d  H 


'I'  h  e  e 


'h[£:r]  h/j  hval :  r  hval']  F  'h[£:r]  \-  d  F' 


h  d,  t.T  ^  hval  ^  t.\T\.hval' ,  F]  F' 


Programs 


\-  p:T  P 


^ (d)  d  ^  H'  'l'(d);  •;  •;  A,  e,  e  hhait  e :  t  ^  I' ]  F 
Where  ^  is  a  good  allocator  for  e 
and  H  =  F]H', 

^ihandie:  Exnhandler(e) 


mov  rt,ri; 
haltpyj! 

and  R  =  {ri  ns32,  r2  ns,  Ye  ns^2-,  ns,  fi  nsQ4,  (2  ^  nsQ4,  ft  nsQ4,  sp  e} 
and  I  =  malloc  re[Exnhndler(e),  e](fihandle5  sp);  I' 


•  H  letrec  d  in  e  :  r  H,R,I 
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Chapter  9 

Implementation 


In  the  previous  chapters  I  developed  a  series  of  translations  mapping  programs  in  an  idealized 
version  of  the  original  TILT  internal  language  (the  MIL)  down  to  an  idealized  typed  assembly 
language  (TILTAL)  and  proved  the  soundness  of  these  translations.  In  this  chapter  I  describe  a 
new  backend  that  I  have  implemented  in  the  TILT  compiler  implementing  a  concrete  version  of 
these  translations. 

The  overall  structure  of  the  new  backend  can  be  seen  in  figure  9.1.  The  implementations  of 
the  translations  generally  follow  quite  closely  on  the  formal  description,  and  the  LIL  language  as 
implemented  is  almost  exactly  the  same  as  the  formal  version.  The  implementation  differs  most 
significantly  from  the  formal  presentation  in  that  its  final  target  is  the  TALx86  language  [MCG'’'99, 
GMOO].  Unlike  the  RISG-like  TILTAL  language  presented  in  chapter  7,  TALx86  is  specialized 
to  the  x86  architecture.  Moreover,  TALx86  uses  numerous  additional  technologies  (such  as  alias 
types  [WMOl])  that  I  do  not  attempt  to  account  for  in  my  formal  presentation.  The  TALx86 
language  was  chosen  as  the  target  because  of  its  the  numerous  tools  already  existing  for  it  (such  as 
an  assembler,  typechecker,  and  linker)  which  I  was  able  to  use  with  minimal  modification. 

In  this  chapter,  I  describe  the  individual  passes  added  to  the  compiler  as  part  of  my  imple¬ 
mentation  work  and  discuss  briefly  the  important  differences  between  the  implementation  and  the 
theoretical  presentation.  In  order  to  demonstrate  the  practicality  of  my  approach,  I  present  some 
measurements  to  quantify  the  size  of  the  typed  binaries  and  their  runtime  performance  as  compared 
to  other  compilers. 


9.1  Singleton  Elimination 

A  key  issue  in  typed  compilation  is  controlling  the  size  of  the  intermediate  forms  of  programs.  Type 
annotations  on  internal  representation  terms  quickly  come  to  dominate  the  overall  representation 
size  to  the  point  that  representing  and  traversing  the  type  information  becomes  by  far  the  dominant 
performance  issue.  However,  there  is  in  general  a  great  deal  of  redundancy  in  this  type  information 
which  can  be  exploited  to  eliminate  or  reduce  this  problem. 

A  number  of  mechanisms  have  been  suggested  [Sha97,  PGHSOO,  GMOO]  to  attempt  to  deal  with 
this  problem.  In  the  TILT  compiler,  the  MIL  as  implemented  controls  type  sizes  by  using  singleton 
kinds  to  provide  a  form  of  type  definition  [PGHSOO,  SH99]  intrinsic  to  the  language.  Types  are 
bound  to  variables  using  a  derived  let  form  so  that  the  type  can  be  referred  to  subsequently  via 
its  name.  Gompiler  passes  such  as  common  sub-expression  elimination  find  redundant  uses  of  types 


163 


^  LI L  (Typed)- 


Optimize 


LIL  (Typed)' 


LIL  to  TALx86 


TALx86  (Typed) 


Assemble,  link 


Typed  binary  files 


Talx86  Typecheck 


Figure  9.1:  Structure  of  the  certifying  TILT  backend 
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and  reduce  them  to  a  single  binding. 

The  type  theory  of  singleton  kinds  has  been  greatly  simplified  by  recent  work.  However,  rather 
than  attempt  to  reconcile  this  type  theory  with  the  LX  type  theory  used  for  type- analysis,  I  chose 
for  the  purposes  of  the  formal  translation  to  use  a  singleton-free  form  of  the  MIL,  and  consequently 
to  rely  in  the  implementation  of  the  new  backend  on  other  technologies  to  control  the  size  of  type 
information.  Since  the  TILT  compiler  produces  internal  representations  using  singletons,  the  first 
additional  pass  to  be  added  to  the  compiler  was  a  singleton  elimination  pass. 

Crary  [CraOO]  showed  that  the  eliminability  of  singleton  kinds  follows  from  the  completeness  of 
the  equivalence  algorithm  given  by  Stone  and  Harper  [SH99]  and  gave  an  algorithm  for  computing 
the  singleton  free  version  of  a  term.  The  implementation  of  singleton  elimination  in  the  compiler 
follows  Crary’s  algorithm  almost  exactly  with  one  minor  changes  in  order  to  avoid  problems  with 
type  sizes,  as  explained  below. 

9.1.1  Preserving  sharing  of  types 

It  would  in  theory  have  been  possible  to  have  implemented  singleton  elimination  concurrently  with 
the  translation  from  the  MIL  to  the  LIL  language.  Since  the  LIL  language  has  its  own  mechanisms 
for  preserving  sharing  (discussed  below  in  section  9.6),  the  result  would  have  been  a  translation 
which  neither  traversed  types  repeatedly,  nor  generated  internal  forms  with  excess  redundant  type 
information.  However,  doing  these  two  translations  simultaneous  seemed  likely  to  be  difficult  and 
error-prone.  Moreover,  the  singleton  elimination  phase  introduces  a  large  amount  of  additional 
constructor  data  that  benefits  from  being  exposed  to  the  MIL  optimization  passes.  Eliminating 
singletons  concurrently  with  the  translation  to  LIL  would  have  prevented  this. 

For  these  reasons,  singleton  elimination  was  implemented  in  TILT  in  two  phases.  The  early 
phase  eliminates  all  singletons  except  uses  which  occur  via  the  derived  let  form  (an  immediate 
application  of  a  type  lambda  to  a  type  in  the  underlying  calculus).  Since  the  derived  elimination 
rule  for  the  let  construct  is  simple  substitution,  it  is  trivial  to  postpone  the  elimination  of  the 
derived  let  type  construct  until  the  translation  to  LIL,  providing  the  MIL  optimizer  with  an 
opportunity  to  improve  the  types  emitted  by  the  elimination  phase^.  This  two  phase  approach 
allows  singleton  elimination  to  be  done  early  in  the  compilation  process  while  still  preserving  a 
compact  representation  using  the  let  definitional  mechanism  until  translation  to  LIL. 

9.1.2  Optimizations  for  singleton  elimination 
Eta  reduction 

The  singleton  elimination  pass  introduces  many  new  types  that  turn  out  to  be  eta-expansions  of 
variables.  It  is  straightforward  to  instrument  the  singleton  elimination  pass  to  catch  many  of  these 
eta-expansions  in  order  to  perform  the  eta-reduction  in  place.  However,  even  with  this  improvement 
on  the  original  algorithm,  singleton  elimination  still  produces  a  fair  number  of  eta-redices.  Conse¬ 
quently,  performing  eta-reduction  after  singleton  elimination  is  an  important  optimization  to  avoid 
redundant  representation  of  types,  as  well  as  to  reduce  the  size  of  the  intermediate  representation. 

^An  alternative  way  of  viewing  this  implementation  choice  is  as  mapping  into  a  calculus  which  can  be  given  two 
different  (but  presumably  equivalent)  theories:  one  which  provides  a  let  form  derived  via  singleton  kinds,  and  one 
which  provides  a  primitive  let  form  with  a  substitution  semantics.  The  equivalence  of  the  two  theories  would  have 
to  be  shown,  however. 
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1.5 


□  SingElim  ONoSingElim 


Figure  9.2:  Benchmark  timings  with  singleton  elimination  (normalized  to  timings  without  singleton  elimi¬ 
nation)  . 


Beta  reduction 

In  addition  to  eta-redices,  singleton  elimination  produces  numerous  beta-redices  in  the  forms  of 
projections  from  known  records  or  applications  of  known  functions.  Eliminating  the  former  is  always 
a  desired  optimization  for  code  in  named-form  since  it  can  only  reduce  code  size  and  eliminates  an 
address  calculation  and  a  memory  fetch.  Eliminating  the  latter  is  a  speculative  optimization  since 
the  function  may  be  called  more  than  once.  Consequently,  even  if  it  were  practical  to  do  these 
optimizations  concurrently  with  singleton  elimination,  it  is  almost  certainly  preferable  to  leave  this 
up  to  the  general  optimization  and  inlining  code  which  must  deal  with  these  issues  in  any  case. 

Common  sub-expression  elimination 

Einally,  singleton  elimination  may  produce  several  copies  of  equivalent  types.  It  is  therefore  valuable 
to  do  common  sub-expression  elimination  (CSE)  on  the  resulting  code.  It  is  particularly  important 
after  singleton  elimination  that  CSE  unify  alpha-equivalent  type  functions,  since  it  is  not  uncommon 
for  many  of  the  generated  types  to  be  simple  alpha- variants. 

9.1.3  Runtime  behavior 

Since  types  must  be  represented  as  data  in  the  TILT  compiler,  it  is  important  to  consider  the 
effect  of  singleton  elimination  on  the  runtime  behavior  of  programs.  While  this  information  is 
not  easily  available  for  the  certifying  backend  since  it  relies  intrinsically  on  singleton  elimination, 
code  can  be  generated  for  the  untyped  Sparc  backend  using  either  normal  or  singleton-free  internal 
representations.  Once  singletons  are  eliminated  from  the  internal  representations  of  the  program. 
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the  remaining  MIL  optimization  passes  can  be  run  as  usual,  since  they  introduce  no  uses  of  singleton 
kinds  except  in  the  form  of  the  derived  let.  For  the  most  part  these  passes  behave  identically  when 
targeting  both  the  untyped  and  the  typed  backends,  up  until  the  point  where  the  translation  to 
the  next  intermediate  language  takes  place. 

Figure  9.2  shows  that  across  a  wide  range  of  benchmarks,  the  singleton-free  code  generally 
performs  similarly  to  the  non-singleton  eliminated  code.  There  was  almost  no  difference  in  object 
code  size  between  the  two  generated  executables. 

I  speculate  that  the  occasional  improvement  in  the  runtime  behavior  can  be  attributed  to  the 
singleton-elimination  phase  acting  as  a  rudimentary  form  of  cross-module  inlining,  by  extracting 
all  possible  definitional  information  about  abstract  types  from  their  kinds,  and  exposing  that  in¬ 
formation  to  the  optimizer.  In  principle,  an  optimization  pass  could  be  implemented  to  perform  a 
similar  function  without  actually  doing  singleton  elimination. 


9.2  MIL  to  LIL 

The  implemented  translation  of  MIL  code  into  LIL  code  follows  the  formal  translation  from  chapter 
5  almost  exactly  in  structure.  The  only  theoretically  significant  change  is  that  in  addition  to  the 
vararg  and  array  optimizations  presented  in  chapter  5,  the  implemented  version  performs  a  related 
representation  optimization  for  sums.  This  optimization  adds  no  significant  complication  to  the 
translation,  merely  requiring  that  the  encodings  of  types  distinguish  between  those  types  which  are 
inhabited  by  heap  pointers  and  those  which  are  inhabited  by  other  values.  This  distinction  is  used 
to  take  advantage  of  the  fact  that  pointers  are  never  small  integers.  Sums  with  exactly  one  value 
carrying  arm  can  be  represented  specially  when  the  carried  value  is  a  pointer:  there  is  no  need  to 
box  and  tag  the  pointer  since  it  can  always  be  distinguished  from  small  integer  tags. 

The  MIL  intermediate  representation  which  is  passed  to  the  translator  is  singleton- free,  except 
with  respect  to  let-bound  definitions  as  discussed  above.  Some  care  is  required  in  the  implementa¬ 
tion  to  avoid  expanding  out  type  definitions  indiscriminately,  and  also  to  avoid  unnecessary  work. 
For  example,  consider  a  MIL  term  of  the  form  let  a  =  c  in  e  The  bound  variable  a  may 
potentially  occur  many  times  in  the  body  of  e.  Moreover,  it  is  not  syntactically  apparent  whether 
a  is  used  as  data  (and  hence  requires  static  encoding  and  dynamic  encoding)  or  as  a  classifier  (and 
hence  requires  static  encoding  and  interpretation),  or  as  both,  or  neither.  The  definition  of  singleton 
elimination  tells  us  that  the  correct  translation  of  this  term  will  be  equivalent  to  the  translation  of 
e[c/a]  (since  this  is  the  intended  meaning  of  the  residual  let  binding).  However,  simply  expanding 
out  the  definition  via  substitution  and  translating  the  resulting  terms  is  unacceptably  inefficient, 
both  in  terms  of  the  size  of  the  resulting  intermediate  representation  and  in  terms  of  the  cost  of 
repeated  translation  of  the  type.  The  earlier  compiler  passes  will  have  ensured  that  any  such  defi¬ 
nition  remaining  is  used  at  least  once:  often  it  will  be  the  case  that  the  binding  is  used  many  times 
(since  similar  bindings  are  unified  via  common-sub-expression  elimination). 

The  implementation  avoids  the  cost  of  re-traversing  these  bindings  by  maintaining  a  mapping 
from  MIL  type  variables  to  memoized  thunks  that  when  forced,  compute  the  static  or  dynamic 
encodings  of  variables  as  needed.  In  this  way,  no  type  binding  is  ever  translated  more  than  once  as 
a  type  and  once  as  a  constructor.  Moreover,  the  memoization  ensures  that  all  of  the  occurrences 
of  the  variable  will  be  replaced  by  the  same  physically  shared  translation,  preventing  an  explosion 
in  the  representation  size. 
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9.3  Closure  Conversion 


Functions  in  the  LIL  intermediate  forms  produced  by  the  translation  from  MIL  code  are  still 
lexically  nested  and  may  refer  to  variables  bound  in  enclosing  scopes.  In  order  to  provide  an  efficient 
compiled  implementation,  a  commonly  used  strategy  is  to  replace  uses  of  functions  (evaluated  via 
substitution)  with  uses  of  closed  code  and  environments.  Environments  preserve  the  values  of  free 
variables,  and  are  passed  as  additional  arguments  to  code  at  runtime.  Functions  in  the  high  level 
language  become  pairs  of  code  pointers  and  environments  in  the  underlying  machine. 

9.3.1  Closure  conversion  strategies 

Numerous  strategies  have  been  proposed  for  handling  closure  conversion  in  a  typed  setting  [CWM98, 
MWCG97,  MMH96,  MH98].  I  briefly  summarize  the  two  main  axes  of  variation  from  an  imple¬ 
mentation  standpoint,  and  refer  the  reader  to  Morrisett  and  Harper  [MH98]  for  a  detailed  analysis 
of  the  different  approaches  and  their  typing  properties. 

The  first  source  of  variation  has  to  do  with  whether  recursion  is  implemented  via  a  primitive 
notion  of  recursive  code,  or  via  the  introduction  of  recursive  closures.  In  the  recursive  code  ap¬ 
proach,  code  functions  are  permitted  to  recursively  refer  to  themselves  and  other  code  functions  in 
the  same  nest  within  their  own  scope.  In  the  recursive  closure  approach,  all  code  is  non-recursive. 
Closures  however,  may  refer  to  back  to  themselves. 

Many  recursive  closure  implementations  are  based  upon  the  idea  of  backpatching,  in  which  the 
environment  record  is  initialized  to  contain  a  pointer  to  the  closure  into  which  it  will  eventually  be 
written.  While  simple  and  expedient  to  implement,  this  approach  requires  either  a  fairly  powerful 
type  system  to  type  the  resulting  code,  or  else  the  use  of  ad-hoc  “dummy  closures”  to  stand  in 
for  the  recursive  reference  in  the  initially  allocated  environment.  In  the  interest  of  avoiding  adding 
extra  complexity  to  the  problem,  I  chose  to  avoid  this  approach. 

Another  approach  to  implementing  recursion  is  to  parameterize  code  functions  over  their  entire 
closure,  instead  of  just  the  environment  portion.  Since  the  closure  contains  the  code  pointer  to 
which  it  is  being  passed,  this  approach  is  sometimes  referred  to  as  the  self-application  approach. 

Self-application  is  a  fairly  elegant  approach  to  solving  the  problem  of  closure  converting  recursive 
code.  However,  it  would  require  a  fair  bit  of  re-tooling  in  the  existing  TILT  compiler  in  order  to 
implement.  In  particular,  the  MIL  type  system  does  not  support  recursive  types  at  higher  kinds, 
which  would  seem  to  be  necessary  to  efficiently  support  the  self  application  semantics. 

For  the  sake  of  simplicity  in  the  implementation  of  closure  conversion  then,  I  chose  to  implement 
the  relatively  straightforward  recursive  code  approach  to  closure  conversion.  Under  this  approach, 
mutually  recursive  functions  nests  become  mutually  recursive  code  nests,  parameterized  over  a 
common  environment.  Functions  become  existentially  packed  tuples  containing  the  code  pointer 
and  the  environment  of  the  function.  At  call  sites,  the  code  pointer  and  environment  are  projected 
out  of  the  tuple,  and  the  former  is  applied  to  the  latter,  along  with  the  original  arguments  of  the 
function. 

9.3.2  Recursive  code  closure  conversion 

There  are  two  well-known  disadvantages  to  the  recursive  code  approach  to  closure  conversion. 

The  first  is  that  since  each  function  must  be  able  to  create  the  environment  of  each  of  its 
callees,  the  environment  of  each  function  in  the  nest  must  contain  the  transitive  closure  of  all  of 
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the  free  variables  of  all  of  the  functions  in  the  nest  (assuming  that  nests  have  been  reduced  to 
strongly  connected  components).  In  practice,  this  means  that  all  of  the  functions  in  the  nest  share 
one  environment,  which  is  consequently  larger  than  the  individual  environments  might  be.  This  is 
somewhat  offset  by  the  fact  that  only  one  environment  per  nest  need  be  allocated. 

The  second,  and  potentially  more  serious  drawback  of  the  recursive  code  approach  is  that  an 
escaping  occurrence  of  a  function  within  its  own  body  requires  the  closure  to  be  reconstructed 
on  each  invocation.  For  functions  implementing  loops,  this  can  obviously  be  problematic.  It  is 
important  to  note  however  that  it  is  not  necessary  to  re-allocate  the  environment  on  each  invocation: 
only  the  two  element  closure.  Moreover,  this  particular  idiom  is  relatively  rare:  most  recursive 
functions  do  not  escape  within  their  body.  Nonetheless,  this  remains  a  serious  disadvantage  of  this 
approach. 

The  advantage  of  this  approach  is  that  it  simple  to  implement,  and  moreover  is  relatively 
simple  to  implement  efficiently.  Most  applications  of  local  functions  (including  all  recursive  calls) 
can  be  implemented  as  calls  directly  to  code,  without  reference  to  closures.  Consequently,  for  most 
non-escaping  functions  closures  need  never  be  allocated. 

More  specifically,  as  implemented  in  the  LIL  backend,  every  recursive  call  (including  all  calls 
to  other  functions  in  the  same  nest)  is  implemented  as  a  direct  code  call.  In  addition,  every  call 
to  a  function  defined  in  the  same  lexical  scope  as  the  call  is  implemented  as  a  direct  code  call. 
It  is  straightforward  to  extend  this  to  functions  defined  in  any  enclosing  scope:  however  this  may 
increase  closure  sizes  and  hence  is  not  done  in  the  LIL  backend. 

In  addition  to  calling  code  functions  directly,  some  additional  optimizations  are  performed  on 
the  calling  sequences  of  non-escaping  functions  (that  is,  functions  which  are  only  called  directly 
as  code).  For  such  functions,  the  environment  record  is  never  actually  constructed.  Instead,  the 
components  of  the  environment  are  passed  as  additional  arguments  to  the  function  (just  as  with  the 
varargs  optimization  for  ordinary  records).  Floating  point  (64  bit)  elements  of  the  environment  are 
further  optimized  by  passing  them  unboxed  on  the  stack.  For  all  functions,  environments  with  only 
a  single  element  are  “unwrapped”:  that  is,  the  underlying  singleton  element  is  passed  unboxed, 
instead  of  boxed  in  an  environment.  Finally,  all  closed  closures  are  statically  allocated  as  data 
(a  closure  may  be  closed  because  its  environment  is  empty,  or  because  all  of  its  constituents  are 
statically  allocated  data). 

One  of  the  advantages  of  using  a  type  erasable  language  is  that  the  treatment  of  types  in  closure 
conversion  is  much  simpler  than  in  the  type  passing  case.  Since  types  are  used  at  compile  time 
only,  type  environments  can  be  passed  immediately  via  application,  instead  of  being  packed  into 
an  existential.  Implementing  closure  conversion  for  types  was  straightforward. 


9.4  General  Optimization 

The  process  of  encoding  types  as  term  data  and  making  typecase  explicit  exposes  new  structure  to 
the  compiler  which  is  available  for  optimization.  Moreover,  it  greatly  complicates  other  phases  of 
the  backend  to  be  overly  concerned  with  generating  no  dead,  redundant,  or  otherwise  sub-optimal 
code.  Consequently,  it  seemed  most  expedient  to  port  the  MIL  one  pass  optimizer  to  the  LIL  so 
that  basic  optimizations  could  be  performed  on  LIL  programs.  The  LIL  optimizer  performs  CSE, 
dead  code  elimination,  eta  reductions  for  records  and  functions,  projection  from  known  records, 
reduction  of  known  switches,  folding  of  boxes  and  unboxes,  constant  folding,  and  reduction  of 
coercions.  Additionally,  it  makes  some  effort  to  recognize  boolean  terms  which  are  used  only  as 


169 


arguments  to  switches,  and  hence  which  can  be  compiled  to  produce  jumps  to  branch  targets 
directly  from  comparisons,  instead  of  evaluating  the  term  to  produce  a  boolean  value,  and  then 
re- branching  on  this  result. 

The  general  optimizer  is  most  useful  to  help  clean  up  after  the  closure  converter,  since  closure 
conversion  involves  a  fairly  substantial  rewriting  of  programs.  Many  box/unbox  reductions  become 
available  after  closure  conversion,  since  floating  point  terms  are  currently  boxed  in  environments. 
Numerous  record  projections  and  record  formations  can  be  eliminated  as  well  in  cases  where  only 
some  or  none  of  the  environment  is  used  in  a  given  context.  Multiple  calls  to  the  same  function  in 
the  same  scope  can  be  optimized  so  that  only  one  projection  of  the  code  pointer  and  the  environment 
need  be  done. 

All  these  optimizations  (and  more)  are  performed  on  the  closure  converted  code.  In  general 
however,  a  great  deal  more  could  be  done  to  optimize  after  closure  conversion.  In  particular,  I 
believe  that  the  output  of  the  closure  converter  could  benefit  greatly  from  some  form  of  partial- 
redundancy  elimination  which  could  move  code  evaluated  on  only  some  paths  down  into  the  arms 
of  switches.  The  closure  converter  introduces  numerous  projections  from  a  function  environment  in 
the  function  header,  some  of  which  are  only  used  conditionally  in  the  function.  It  may  also  be  forced 
to  re-allocate  closures  for  recursive  escaping  occurrences  which  are  again  only  used  conditionally  in 
the  function.  Such  optimization  is  beyond  the  scope  of  this  thesis. 


9.5  Lil  To  Tal 

The  implemented  translation  from  LIL  to  TALx86  is  conceptually  quite  similar  to  the  formal 
description  given  in  chapter  8,  despite  the  fact  that  the  TALx86  language  is  substantially  different 
from  the  idealized  TILTAL  language.  While  details  of  the  implementation  of  particular  constructs 
are  specialized  to  the  TALx86  type  system,  the  overall  code  generation  approach  if  very  similar. 

Code  generation  and  register  allocation  are  done  simultaneously  in  (notionally)  a  single  pass. 
In  fact,  for  simplicity  the  code  generator  is  run  twice  on  each  function:  once  using  a  virtual  frame 
pointer  and  once  using  stack  pointer  based  frames.  The  output  of  the  first  run  is  ignored:  it  is 
done  only  to  determine  the  frame  size  so  that  offsets  from  the  stack  pointer  can  be  computed 
directly.  This  was  done  for  the  sake  of  simplicity  in  the  coding  only:  it  would  be  straightforward 
to  implement  a  second  pass  to  eliminate  uses  of  the  frame  pointer. 

9.5.1  Code  generation 

As  with  the  formal  description,  code  generation  is  done  by  translating  terms  with  a  pre-assigned 
destination  in  a  similar  fashion  to  that  described  by  Dybvig  et  al.  [DHB90].  This  is  particularly 
convenient  given  the  register  allocation  strategy  chosen,  in  which  let-bound  variables  generally 
have  already  been  assigned  a  location  when  their  bindings  are  translated.  Consequently,  operations 
which  are  bound  to  variables  can  often  be  translated  in  such  as  way  as  to  compute  their  values 
directly  into  the  appropriate  location. 

As  in  the  formal  development  form  chapter  8,  terms  are  also  translated  with  a  context  of 
occurrence  which  indicates  the  use  to  which  the  result  of  a  computation  is  to  be  put.  For  example, 
terms  which  are  translated  in  a  “return”  context  are  expected  to  de-allocate  the  frame  and  return 
directly,  instead  of  returning  a  value.  Consequently,  general  tail-call  elimination  falls  out  very 
naturally:  function  bodies  that  end  in  function  calls  (even  when  guarded  by  a  conditional  or  a 
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switch)  simply  de-allocate  the  frame  and  jump  to  the  new  callee.  (Some  additional  adjustment  of 
the  stack  is  necessary  when  the  callee  has  a  different  number  of  arguments  than  the  caller). 

Additional  contexts  of  occurrence  are  used  to  generate  efficient  code  for  conditional  branches, 
and  for  terms  evaluated  only  for  side-effects.  A  term  whose  value  is  used  only  to  decide  a  branch  is 
evaluated  in  a  context  of  occurrence  which  provides  appropriate  continuations  which  can  be  called 
directly.  A  term  whose  value  is  unused  is  evaluated  in  a  context  of  occurrence  which  indicates  that 
no  return  value  need  be  provided. 

9.5.2  Register  allocation 

The  assignment  of  variables  and  temporaries  to  registers  and  stack  slots  is  performed  simultaneously 
with  code  generation.  Code  generation  is  performed  in  a  bottom  up  fashion,  so  that  the  live  range 
of  variables  is  apparent  from  the  binding  structure  of  a  program:  a  variable  becomes  live  when 
it  is  first  encountered  in  the  bottom-up  pass,  and  becomes  dead  when  its  binding  site  is  reached. 
Registers  and  stack  slots  are  assigned  to  variables  greedily,  as  needed  by  the  generated  code.  When 
a  variable  use  is  encountered  by  the  code  generator,  the  register  allocator  is  queried  to  provide  the 
machine  location  in  which  the  variable  resides.  The  code  generator  dictates  whether  or  not  this 
location  is  permitted  to  be  a  stack  slot,  or  must  be  a  register. 

Register  assignment  and  spilling 

The  register  allocator  maintains  state  describing  the  assignment  of  variables  to  registers  and  stack 
slots  and  vice  versa.  This  state  encodes  the  assumptions  which  the  previously  emitted  code  makes 
about  the  state  of  the  machine  upon  entry.  When  the  register  allocator  is  queried  about  a  variable 
to  which  it  has  already  assigned  a  machine  location  it  returns  the  previously  assigned  location  if  it  is 
appropriate.  However,  if  the  variable  was  previously  assigned  to  a  stack  slot  and  the  code  generator 
requires  that  the  use  under  consideration  be  assigned  a  register,  the  register  allocator  chooses  a  free 
register  and  changes  the  state  to  map  the  variable  to  this  new  location.  Since  previously  emitted 
code  assumes  a  different  location  for  the  variable,  the  register  allocator  is  responsible  for  emitting 
code  to  initialize  the  previously  assigned  stack  slot  from  the  newly  assigned  register. 

If  a  variable  is  required  to  be  in  a  register  and  all  registers  are  already  assigned  to  variables, 
a  variable  must  be  chosen  to  spill  to  the  stack.  Since  previously  emitted  code  assumes  that  the 
variable  was  in  a  register,  the  allocator  must  emit  code  to  load  the  register  from  the  newly  assigned 
stack  slot.  As  a  simple  spill  heuristic,  the  variable  whose  next  use  is  farthest  into  the  future  (that 
is,  whose  next  use  is  farthest  forward  in  the  instruction  stream)  is  spilled.  This  is  intended  to 
attempt  to  provide  more  intervening  instructions  to  hide  the  latency  of  the  memory  access,  as  well 
as  to  attempt  to  keep  locally  active  variables  in  registers.  No  effort  was  put  into  attempting  to 
validate  this  choice  of  heuristic  as  extensive  tuning  of  the  code  generator  is  beyond  the  scope  of 
this  thesis.  However,  from  informal  inspection  of  the  generated  code,  the  allocator  seems  to  usually 
make  reasonable  choices,  though  there  are  also  cases  where  it  could  be  improved. 

If  a  location  is  requested  for  a  previously  unseen  variable,  it  is  assigned  a  register  if  possible, 
and  otherwise  a  stack  slot  if  permitted  by  the  code  generator.  If  neither  of  these  cases  apply,  a 
variable  must  be  spilled  as  described  above  to  free  up  a  register  for  the  new  variable. 

At  a  variable  binding  site,  the  register  allocator  provides  the  code  generator  with  the  loca¬ 
tion  into  which  the  variable  initialization  code  should  write  its  value.  Variables  which  have  not 
been  assigned  a  location  when  their  binding  site  is  encountered  are  unused,  and  the  result  of  the 
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initialization  code  may  be  safely  discarded. 

Unlike  many  register  allocation  schemes  in  which  variables  are  assigned  a  single  location  with 
spill  code  emitted  to  load  them  into  temporary  registers  when  necessary,  this  register  allocation 
approach  allows  variable/location  assignments  to  be  changed  dynamically.  Variables  may  be  moved 
between  stack  slots  and  registers  multiple  times  over  the  course  of  a  function.  In  this  sense  this 
approach  is  similar  to  the  “second  chance  binpacking”  allocation  strategy  described  by  Traub,  et 
al.  [THS98]. 

Machine  state  types 

An  important  side  benefit  of  the  register  allocation  data  structures  is  that  they  provide,  at  any  given 
point  during  code  generation,  a  snapshot  of  the  machine  state.  The  register  allocator  maintains  a 
mapping  from  variables  to  registers  and  frame  locations.  Consequently,  it  is  straightforward  in  this 
framework  to  define  the  translation  of  a  LIL  context  T  as  described  in  chapter  8.  The  operation 
|r|^  mapping  a  stack  tail  a  and  a  LIL  context  T  to  a  machine  type  T^i  is  trivial  to  implement 
given  the  data  structures  already  used  by  the  register  allocator.  The  intention  is  that  the  register 
allocator  as  implemented  should  constitute  a  “good  allocator”  as  described  in  chapter  8. 

9.5.3  Additional  TILT  constructs 

There  are  many  more  language  constructs  in  the  actual  LIL  language  as  implemented  than  in  the 
formal  LIL  description.  For  the  most  part  however,  these  are  simply  additional  primitive  operations 
on  basic  types,  and  are  not  especially  interesting.  However,  two  such  constructs  warrant  additional 
comment. 

Coercions 

Vanderwaart  et  al.  [VDP'*'03]  describe  a  coercion  calculus  used  in  the  TILT  compiler  to  faithfully 
implement  Standard  ML  datatypes  in  an  efficient  manner.  Applications  of  coercions  into  and  out 
of  datatypes  in  this  system  incur  zero  runtime  overhead.  This  system  is  quite  simple,  requiring 
only  one  new  type  constructor  and  three  new  term  level  constructs.  However,  it  was  not  supported 
by  the  existing  TALx86  type  system,  and  I  preferred  to  modify  this  system  as  little  as  possible. 

A  simple  technique  for  eliminating  these  coercions  is  to  replace  them  with  function  calls.  This 
approach  has  the  benefit  of  placing  no  additional  burden  on  the  code  generator  and  requiring  no 
changes  to  the  TALx86  type  system,  but  suffers  from  a  substantial  performance  penalty  [VDP^OS] . 
As  a  compromise,  I  chose  to  implement  coercions  as  functions  with  a  highly  specialized  calling 
convention.  In  particular,  they  are  implemented  as  functions  whose  argument  and  result  are  both 
in  the  same  register,  and  for  which  all  other  registers  are  callee-save.  Since  the  actual  bodies 
of  the  functions  do  no  runtime  work,  this  means  that  the  runtime  cost  of  applying  a  coercion  is 
reduced  to  two  instructions:  a  call  and  an  immediate  return.  There  may  be  some  additional  cost 
since  variables  may  need  to  be  spilled  or  re-shuffled  in  order  to  place  the  argument  value  in  the 
appropriate  register  before  calling  the  coercion.  However,  in  general  the  overhead  is  quite  low. 

External  functions 

In  general,  most  programs  need  at  some  point  to  call  external  functions.  At  the  very  least,  any 
interesting  program  must  at  some  point  make  a  system  call  to  do  input  and  output.  These  external 
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functions  are  probably  not  in  general  provably  safe,  and  are  in  any  case  not  available  in  a  verifiable 
form.  Consequently,  these  external  operations  are  generally  considered  part  of  the  trusted  comput¬ 
ing  base:  that  is,  the  safety  of  the  checked  code  is  certihed  relative  to  the  correctness  and  safety 
of  the  external  code.  However,  it  is  important  that  there  be  some  mechanism  for  controlling  which 
external  code  the  certified  program  has  access  to,  both  so  that  the  certification  system  cannot 
be  trivially  subverted,  and  so  that  the  ability  of  mobile  code  to  access  system  resources  can  be 
controlled  by  the  host. 

One  approach  to  this  is  to  explicitly  incorporate  the  external  functionality  needed  by  programs 
into  the  runtime  layer  upon  which  the  program  runs.  Essentially,  the  source  and  target  language 
are  extended  with  a  statically  known  set  of  additional  primitives.  This  has  the  benefit  of  making  it 
easy  to  control  the  set  of  resources  available  to  programs,  and  of  routing  access  to  these  resources 
through  a  central  authority  (leaving  open  the  possibility  of  doing  additional  sandbox  style  dynamic 
checks).  A  disadvantage  of  this  approach  is  that  it  is  generally  difficult  to  extend  the  set  of  external 
functions  available,  since  the  runtime  itself  must  be  modihed. 

The  approach  taken  in  this  thesis  is  to  push  the  issue  of  access  to  external  resources  out  to  the 
linking  phases.  The  TILT  compiler  extends  the  Standard  ML  language  with  a  primitive  function 
type  classifying  functions  obeying  the  C  calling  convention,  and  a  primitive  application  form  for 
applying  such  functions.  Any  compilation  unit  may  declare  a  set  of  external  functions  against 
which  it  is  compiled.  The  generated  TALx86  modules  list  these  functions  as  typed  imports.  The 
code  in  the  compiled  module  is  certified  to  be  safe  under  the  assumption  that  the  imported  labels 
are  safe  at  their  declared  types. 

The  question  of  how  these  import  assumptions  are  discharged  then  becomes  a  policy  decision 
on  the  part  of  the  host  at  link  time.  The  TALx86  linker  provides  three  different  modes  by  which 
import  assumptions  can  be  discharged. 

The  first  is  that  a  typed  binary  may  be  provided  to  satisfy  the  import  assumptions.  In  this 
case,  these  assumptions  will  be  explicitly  checked  by  the  linker,  and  so  long  as  the  TALx86  type 
system  is  correct,  the  linked  result  is  guaranteed  to  be  safe. 

The  second  mode  is  that  an  untyped  binary  can  be  provided,  along  with  an  export  interface 
providing  types  for  the  exported  labels.  In  this  case,  the  TALx86  linker  can  check  that  the  import 
assumptions  of  the  client  module  match  the  exported  interface  of  the  untyped  binary,  but  cannot 
check  that  the  untyped  binary  actually  provides  the  given  interface.  Consequently,  the  safety  of 
the  linked  program  is  contingent  on  the  untyped  binary  actually  matching  its  exported  interface. 

The  third  linking  option  is  to  simply  link  an  untyped  binary  against  the  typed  client  code,  with 
no  interface  checking  whatsoever.  In  this  case  the  safety  of  the  linked  program  is  contingent  on  the 
untyped  binary  matching  the  declared  import  assumption  of  the  typed  binary. 

The  difference  between  the  second  and  the  third  case  is  important,  but  subtle.  The  key  element 
is  that  in  the  third  case,  there  are  no  restrictions  on  what  functions  the  typed  binary  can  call, 
and  what  types  it  assigns  to  them.  Consequently,  it  is  trivial  to  subvert  the  type  system  simply 
by  providing  a  typed  binary  which  imports  functions  at  incorrect  types.  In  the  second  case,  this 
is  not  possible,  since  the  host  provides  an  explicit  interface  against  which  the  typed  client  is  to 
be  linked.  So  long  as  the  host  provides  the  correct  types  for  the  exported  functions  (and  so  long 
as  those  functions  are  implemented  correctly  and  safely),  the  client  is  unable  to  subvert  the  type 
system  since  it  is  constrained  to  respect  the  exported  interface. 

In  the  TILT  typed  backend,  a  combination  of  the  first  two  approaches  is  used  to  implement  the 
runtime  layer  upon  which  programs  run.  Some  parts  of  the  runtime  functionality  are  implemented  in 
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TALx86  and  hence  can  be  checked  directly.  Other  parts  are  implemented  in  C,  with  an  explicitly 
provided  interface.  The  runtime  layer  as  currently  implemented  is  very  minimal  and  does  not 
provide  any  support  for  system  calls  or  any  useful  library  functions.  A  version  of  the  Standard  ML 
Basis  Library  was  modified  slightly  to  provide  this  sort  of  operating  system  functionality.  Currently, 
all  of  this  code  is  linked  using  the  third  approach.  This  is  satisfactory  for  using  the  compiler  as 
a  general  x86  backend,  but  for  use  as  part  of  a  certifying  compilation  infrastructure  it  would  be 
necessary  to  define  a  more  limited  interface  to  host  resources,  and  to  define  explicit  export  interfaces 
against  which  client  code  could  be  checked. 


9.6  Engineering 

While  extensive  engineering  of  the  TILT  certifying  backend  is  beyond  the  scope  of  this  thesis,  a 
certain  amount  of  engineering  work  was  required  simply  to  demonstrate  the  feasibility  of  certified 
compilation  in  this  setting.  This  section  describes  some  of  the  techniques  used. 


9.6.1  Type  representation 

Unlike  the  MIL,  the  LIL  lacks  an  internal  definition  mechanism  for  types.  In  the  absence  of  any 
sharing  mechanism  whatsoever,  types  are  observed  to  grow  infeasibly  large.  The  solution  to  this 
problem  used  in  the  LIL  is  to  enable  sharing  via  hash-consing. 

Hash-consing  is  a  popular  technique  for  controlling  representation  size  which  works  by  looking 
up  each  newly  allocated  type  into  a  table  containing  all  previously  allocated  types.  Whenever  a 
previously  allocated  version  is  found  in  the  table,  the  two  copies  may  be  be  represented  as  pointers 
to  the  same  data-structure.  Hashing  is  used  to  make  this  reasonably  efficient:  each  node  in  the 
parse  tree  of  a  type  contains  the  hash  value  of  the  type,  which  can  then  be  used  to  produce  hash 
values  for  parent  nodes.  The  table  of  allocated  nodes  can  then  be  implemented  as  a  hash  table 
with  reasonably  fast  lookup  times. 

Hash-consing  ensures  that  all  syntactically  equal  types  can  be  shared.  However,  types  which 
are  alpha-variants  of  each  other  will  not  be  shared  in  this  scheme  (except  accidentally)  since  the 
hash  codes  of  different  variables  will  usually  be  different.  This  problem  can  be  completely  avoided 
by  switching  to  a  representation  in  which  variables  are  represented  via  deBruijin  indices,  wherein 
a  variable  use  is  implemented  as  a  count  of  the  number  of  binding  sites  between  the  variable  use 
and  the  variable  binding  site.  When  variables  are  implemented  in  this  fashion,  all  types  which  are 
alpha-equal  will  also  be  syntactically  equal,  and  hence  can  be  shared  via  hash-consing. 

However,  deBruijin  indices  are  notoriously  difficult  to  work  with,  and  give  no  guarantee  of  im¬ 
proved  sharing.  Moreover,  all  of  the  existing  TILT  infrastructure  was  built  around  using  variables. 
Consequently,  I  chose  to  use  a  standard  representation  of  variables.  In  practice,  this  seems  to  work 
well  so  long  as  care  is  taken  to  avoid  introducing  un-necessary  alpha- variants. 

In  addition  to  sharing  syntactically  equal  types  in  memory,  it  turns  out  to  be  very  important 
to  share  kinds  as  well.  In  almost  all  programs,  the  number  of  syntactically  distinct  kind  nodes  is 
smaller  than  one-hundred.  In  the  absence  of  hash-consing  for  kinds  however,  the  number  of  actual 
kind-nodes  in  memory  is  commonly  three  or  four  orders  of  magnitude  larger  than  this.  In  general 
throughout  this  section,  all  of  the  techniques  discussed  in  the  context  of  sharing  types  should  be 
considered  to  apply  to  kind  information  as  well. 
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9.6.2  Traversing  programs 


Sharing  types  and  kinds  in  memory  is  crucial  in  order  to  reduce  internal  representations  of  programs 
to  a  manageable  size:  that  is,  to  reduce  the  space  usage  of  the  compiler.  However,  it  is  almost 
equally  important  to  ensure  that  in  addition  to  avoiding  redundant  representation  of  types,  the 
compiler  also  avoid  redundant  computation  on  types.  Hash-consing  permits  type  parse-trees  to  be 
represented  more  compactly  as  directed  acyclic  graphs  (DAGs).  Compiler  passes  that  rewrite  types 
represented  in  this  form  must  also  be  careful  to  traverse  these  programs  as  DAGS. 

Traversing  programs  in  DAG  form  is  implemented  by  providing  an  abstract  lookup  table  in 
which  hash-consed  type  nodes  can  be  inserted.  Node  lookup  tables  are  used  in  compiler  passes  to 
keep  sets  of  nodes  already  traversed:  or  more  generally,  to  keep  mappings  from  previously  traversed 
nodes  to  the  traversal  result  (or  other  useful  information).  Before  attempting  to  rewrite  a  node,  the 
compiler  pass  checks  the  table  for  previously  computed  result.  If  no  such  result  is  found,  the  result 
is  computed  and  entered  into  the  table  for  future  use.  This  allows  passes  to  avoid  re-traversing 
types  unnecessarily  without  impacting  the  structure  of  algorithms  in  any  significant  way. 

Compiler  passes  that  produce  different  results  based  on  contextual  information  are  slightly  more 
complicated  to  implement  in  this  way,  since  it  is  no  longer  necessarily  the  case  that  syntactically 
equal  types  rewrite  to  the  same  results.  This  situation  only  really  arose  in  the  LIL  typechecker 
which  is  discussed  extensively  below. 


9.6.3  Fast  substitution 

A  very  common  operation  in  the  compiler  is  the  substitution  of  types  for  free  type  variables  in 
other  types  (as  well  as  kinds  for  kind  variables).  In  practice,  substitution  often  proves  to  be  an 
efficiency  bottleneck  in  the  TILT  compiler.  In  order  to  make  substitution  faster,  the  LIL  backend 
maintains  sets  containing  free  type  and  kind  variables  at  each  type  and  kind  node.  This  allows 
substitutions  for  variables  to  only  traverse  those  nodes  which  actually  contain  free  occurrences  of 
the  substituted  variables. 

Another  important  benefit  of  maintaining  sets  of  free  variables  is  that  it  allows  the  substitution 
code  to  avoid  introducing  unnecessary  alpha-variants  of  types.  Gapture  avoiding  substitution  must 
in  general  alpha-vary  when  crossing  a  type  variable  binding,  since  the  bound  variable  may  occur 
free  in  the  type  being  substituted.  This  alpha-variation  may  cause  an  entirely  new  copy  of  the  type 
to  be  created,  even  if  the  substitution  only  affects  small  sub-trees  of  the  type.  This  can  be  avoided 
in  the  case  where  the  bound  variable  does  not  occur  free  in  the  type  being  substituted.  Maintaining 
sets  of  free  variables  makes  this  check  efficient. 


9.6.4  Fast  alpha-equivalence 

The  hash-consed  implementation  allows  for  a  fast  syntactic  equality  check,  since  any  two  syntacti¬ 
cally  equal  types  will  be  represented  by  the  same  type  node  and  hence  can  be  checked  for  equality 
in  constant  time.  In  cases  where  the  equality  check  fails,  a  normal  equivalence  check  must  be 
performed,  since  alpha- variants  of  the  same  type  will  not  in  general  be  represented  by  the  same 
node. 
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9.6.5  Weak  Head-Normalization 


Weak  head-normalization  is  a  common  operation  on  types,  since  it  is  the  first  step  in  checking  type 
equality.  In  addition  to  representing  types  as  a  directed  acyclic  graph,  it  is  important  to  normalize 
them  as  graphs  as  well  to  avoid  repeated  reductions  of  the  same  terms.  This  is  implemented  in  the 
LIL  back  end  by  associating  with  every  hash-consed  type  node  an  optional  weak  head-normal  form. 
The  reduction  algorithm  then  checks  for  a  previously  computed  normal  form  before  reducing  the 
term  and  avoids  re-computing  it  if  present.  If  the  normal  form  is  not  present  it  must  be  computed: 
but  the  new  normal  form  can  be  stored  with  the  node  for  use  by  subsequent  reductions  of  the 
same  shared  node.  This  technique  (among  others)  has  been  previously  described  and  empirically 
characterized  by  Shao  et  al  [SLM98] . 

9.6.6  Engineering  the  LIL  type  checker 

A  valuable  benefit  of  using  a  typed  internal  language  in  a  compiler  is  that  it  allows  the  output 
of  each  compiler  phase  to  be  checked  to  ensure  that  nothing  untoward  has  happened.  Many,  if 
not  most  compiler  bugs  will  cause  a  compiler  to  generate  type  incorrect  code  at  some  point.  An 
internal  type  checker  therefore  provides  the  valuable  service  of  catching  and  pinpointing  compiler 
bugs  before  runtime,  when  they  are  likely  to  be  much  harder  to  track  down. 

In  order  to  reap  these  benefits  however,  the  compiler’s  internal  type  checker  must  be  efficient 
enough  to  quickly  check  anything  the  compiler  produces.  However,  the  simple  technique  described 
in  the  previous  section  for  traversing  types  as  directed  acyclic  graphs  cannot  be  applied  naively  to  a 
type  checker.  The  reason  is  that  type  checking  is  a  context  sensitive  operation:  syntactically  equal 
types  may  be  well-formed  in  one  typing  context  and  not  in  another  (for  example  because  of  the 
presence  of  free  variables  referring  to  different  binding  sites).  In  the  TILT  compiler,  maintaining 
the  simplicity  of  the  typechecker  implementation  was  an  important  goal  since  the  correctness  of 
the  typechecker  is  both  important  and  hard  to  test  (at  least  in  terms  of  soundness).  However, 
the  straightforward  approach  of  completely  ignoring  sharing  of  types  and  kinds  proved  completely 
impractical.  Typechecking  small  programs  easily  exhausted  the  resources  of  a  Pentium  4  machine 
with  one  gigabyte  of  RAM.  As  a  first  step,  it  was  necessary  to  find  a  way  to  avoid  redundant  type 
checking. 

Exploiting  physical  sharing  in  the  type  checker 

While  the  nature  of  typechecking  is  context  sensitive,  it  is  by  far  the  most  common  case  that 
checking  syntactically  equal  terms  will  produce  the  same  result  (that  is,  that  the  terms  will  be 
well- typed  at  the  same  kind).  Moreover,  it  is  most  frequently  the  case  that  two  syntactically  equal 
terms  are  judged  to  be  well-typed  for  the  same  reasons:  that  is  that  the  contexts  in  which  they  are 
checked  are  (essentially)  equal.  A  first  cut  at  improving  the  performance  of  the  typechecker  then  is 
to  keep  a  table  mapping  types  to  contexts  in  which  they  have  already  been  judged  equivalent  (as 
well  as  the  kind  classifying  them).  Before  checking  a  type,  the  compiler  first  checks  a  lookup  table 
to  see  if  the  type  under  consideration  has  been  checked  before  in  an  equivalent  context  (and  with 
an  equivalent  kind). 

Strict  pointwise  equality  on  typing  contexts  could  in  principle  be  somewhat  expensive  to  com¬ 
pute,  and  moreover  is  vastly  more  restrictive  than  is  necessary.  It  is  frequently  the  case  that  two 
syntactically  equal  terms  will  be  checked  in  two  contexts  which  differ,  but  only  in  irrelevant  ways 
(for  example,  one  context  may  be  a  well- formed  extension  of  the  other).  An  improvement  on  the 
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simple  technique  described  in  the  previous  paragraph  is  to  observe  that  so  long  as  the  two  context 
are  both  well-formed  and  are  pointwise  equal  on  the  free  variables  of  the  type  under  consideration, 
it  is  unnecessary  to  re-check  the  term  in  the  new  context. 

These  techniques  suffice  to  permit  traversal  of  types  as  DAGs  in  the  LIL  typechecker  with  a 
fairly  small  overhead. 

Context  substitutions 

The  technique  described  in  the  previous  section  avoids  most  redundant  checking  of  types,  but  is  not 
sufficient  in  and  of  itself  to  make  checking  large  programs  practical  in  TILT.  The  LIL  presents  an 
unusual  engineering  challenge  because  of  the  nature  of  its  type  analysis  mechanism.  Several  of  the 
LIL  type  checking  rules  induce  substitutions  into  typing  contexts  as  part  of  the  type  refinement 
mechanism.  For  example,  in  one  of  the  rules  for  type  checking  type  pair  refinements,  an  explicit 
type  pair  is  substituted  for  a  variable  in  the  typing  context. 

A,  /3:ki,  7:^2,  A';  r[(/3, 7)/a]  h  e[{(5, 7)/a]  :  r[(/3, 7) /a]  exp 
A,  a:Ki  X  K2,  A'  h  c  =  a  :  ki  x  K2 

'k;  A,  a-.Ki  X  k,2,  A';  T  h  let(/3, 7)  =  c  in  e  :  r  exp 

This  context  substitution  operation  (r[(/3, 7) /a])  turns  out  to  be  quite  expensive,  since  implemented 
naively  it  involves  duplicating  the  entire  typing  context  after  every  type  refinement.  The  fast 
substitution  cut  off  afforded  by  the  free  variable  sets  on  type  nodes  means  that  types  in  the  context 
are  not  traversed  unnecessarily,  but  in  the  absence  of  a  very  specialized  data  structure,  the  container 
implementing  the  context  must  be  copied  in  entirety.  Since  type  refinement  operations  are  quite 
frequent  in  many  programs,  this  rapidly  becomes  unmanageable. 

There  are  numerous  approaches  that  one  could  take  to  making  this  more  efficient.  The  approach 
taken  in  the  LIL  backend  is  based  primarily  on  two  observations.  Firstly,  many  elements  of  a  typing 
context  will  not  be  affected  by  the  context  substitutions  at  all.  Of  those  that  are,  many  of  them  will 
not  actually  be  referenced  in  any  given  term:  it  is  often  the  case  that  the  body  of  a  type  refinement 
construct  will  have  at  most  two  or  three  free  variables.  Any  effort  spent  traversing  unreferenced 
context  elements  is  wasted. 

Secondly,  it  is  common  to  encounter  several  type  refinement  operations  consecutively.  Imple¬ 
mented  naively,  this  means  that  multiple  passes  over  the  typing  context  may  be  made  with  no 
intervening  variable  lookups. 

The  LIL  implementation  of  context  substitution  therefore  attempts  to  do  two  things.  In  order 
to  take  advantage  of  the  second  property,  substitutions  are  aggregated  and  carried  out  lazily. 
Substituting  into  a  context  simply  records  the  substitution  without  carrying  it  out.  Multiple 
substitutions  can  consequently  be  aggregated  together.  In  order  to  take  advantage  of  the  first 
property,  these  aggregated  substitutions  are  then  only  carried  out  on  the  result  of  a  variable  lookup. 
Full  context  substitutions  are  never  performed. 

There  are  a  number  of  subtle  points  involved  in  this  optimization.  It  is  clearly  not  valid  to 
simply  maintain  a  substitution  mapping  variables  to  types  and  apply  it  to  every  type  that  is  looked 
up  in  the  context,  since  different  substitutions  will  have  been  in  place  at  different  binding  sites. 
In  the  next  section,  I  give  an  informal  presentation  of  an  extension  to  the  static  semantics  of 
the  LIL  that  captures  this  optimization,  and  then  briefly  discuss  the  differences  with  the  actual 
implement  ation . 


177 


Fast  context  substitution 

As  a  first  cut,  I  extend  the  notion  of  a  typing  context  T  with  substitution  nodes.  For  brevity,  I 
elide  64  bit  context  entries  -  their  addition  is  trivial. 

r  ::=  •  I  r,  x:t  |  a  ^  c 

In  order  to  ensure  that  substitutions  are  carried  out  correctly,  I  replace  the  implicit  use  of  context 
re-ordering  in  the  LIL  static  semantics  with  explicit  re-ordering  rules  that  carry  out  substitution 
nodes  as  needed. 

'h;  A;  r,  y.Ty,  x-.Tx,T'  \-  sv:t  'h;  A;  F,  a  ^  c,  x-.Tx[c/ a\,T'  \-  sv.t 

4^;  A;  F,  x:Tx,y-Ty^  F'  h  s?; :  r  'F;  A;  F,  x:Tx,  a  ^  c,T'  \-  sv  :  t 

The  variable  lookup  rule  is  then  restricted  in  the  usual  way,  requiring  that  variables  be  re-ordered 
to  the  end  of  the  context  for  lookup. 


A  h  F,  x:t  ok  h  'F  ok 


'F;  A; F, x:t  \-  x:t 

Context  substitutions  can  then  be  implemented  in  terms  of  these  substitution  nodes. 

4';  A,/3:Ki,7:K2,A';F,a  ^  (/3,7)  h  e[{P , 'y) / a]  :  t[{P ,  j) / a]  exp 
A,  a:Ki  X  K2,  A'  h  c  =  a:  ki  x  K2 

4^;  A,  a:Ki  x  ^2,  A';  F  h  let(/3, 7)  =  c  in  e  :  r  exp 

This  version  of  the  variable  pair  refinement  rule  simply  adds  a  substitution  node  (F,q;  ^  {f3,'y)) 
instead  of  explicitly  substituting  for  a  as  in  the  original  version  (F[(/3, 7)/a]). 

This  extension  to  typing  contexts  captures  the  notion  that  variable  substitution  need  only  be 
performed  on  referenced  variables,  and  can  be  deferred  until  the  point  of  reference.  This  easily 
extends  to  capture  the  idea  of  aggregating  substitutions  by  replacing  the  single  substitutions  in 
the  context  with  simultaneous  substitutions  for  multiple  variables.  A  simultaneous  substitution  0 
maps  variables  to  types. 

0  ::=  •  I  a  ^  c,  0 

The  operation  of  a  substitution  0  on  a  type  c  is  defined  by  a  function  c[0]  mapping  types  to  types. 


a[a  c,  0] 
a' [a  c,  0] 


def 

=  c 

def 
=  C 

a'[0]  (a  /  a') 

def 


The  remaining  cases  proceed  compositionally  over  the  structure  of  types  exactly  as  with  a  normal 
substitution  (taking  care  to  avoid  capture  when  crossing  binding  sites).  Composition  of  simultane¬ 
ous  substitutions  0  o  0'  is  defined  explicitly  as  an  operation  on  substitutions  obeying  the  equation 
c[0o0']  =  (c[0])[0']. 


•  o  0 


def 


0 


{a  ^  c,  0)  o  0'  =  a  ^  (c[0']),  (0  o  0') 


178 


Aggregation  of  substitutions  in  typing  contexts  is  implemented  by  replacing  single  substitution 
nodes  with  simultaneous  substitution  nodes. 

r  ::=  •  I  T,x:t  \  T,  0 

The  re-ordering  rule  for  substitution  nodes  is  modified  appropriately. 

'h;  A;r,0,x:(rx[0]),r'  'r  sv.t 


'h;  A;  r,  x-.Tx-,  0,  r'  h  St! :  r 

A  new  structural  rule  is  introduced  permitting  adjacent  substitutions  to  be  aggregated  using  sub¬ 
stitution  composition. 

^;A;r,0io02,r'h  s?;:r 


^;A;r,0i,02,r'hs?;:r 


Fast  context  substitution  in  practice 

The  actual  implementation  used  in  the  LIL  backend  differs  from  this  presentation  slightly.  While 
the  formal  system  implements  typing  contexts  as  lists,  the  implementation  uses  a  splay  tree  im¬ 
plementation  for  efficient  variable  lookup.  In  order  to  avoid  the  need  to  design  and  implement 
a  custom  balanced  tree  implementation  supporting  explicit  substitution  nodes,  I  chose  instead  to 
maintain  the  substitution  nodes  in  an  auxiliary  data-structure,  allowing  the  core  splay  tree  data 
structure  to  remain  unchanged. 

Notionally,  the  implemented  version  can  be  derived  from  the  system  described  above  in  the 
following  manner.  Number  the  substitution  nodes  in  a  typing  context,  starting  from  the  “leftmost” 
substitution  node.  With  every  variable/type  pair  in  the  context,  associate  the  number  of  the  first 
substitution  to  its  right  (that  is,  the  first  substitution  that  applies  to  it).  Finally,  remove  the 
substitution  nodes  into  a  separate  data  structure  associating  each  substitution  with  its  index. 

Upon  looking  up  a  variable,  apply  the  composition  of  all  substitutions  with  indices  greater  than 
or  equal  to  that  associated  with  the  variable  (that  is,  all  substitutions  that  were  to  the  right  of 
the  variable  in  the  original  context).  This  corresponds  to  the  sequence  of  substitution  that  would 
have  been  performed  by  the  sequence  of  structural  re-ordering  steps  required  to  move  the  variable 
to  the  end  of  the  context.  As  an  optimization,  this  composition  of  substitutions  may  be  computed 
eagerly. 

Upon  inserting  a  variable  into  the  context,  associate  with  it  the  first  index  larger  than  the 
largest  currently  used  index. 

Upon  substituting  into  the  context,  associate  the  new  substitution  with  the  first  index  larger 
than  the  largest  currently  used  index.  The  composition  of  this  substitution  with  previous  substitu¬ 
tions  may  be  eagerly  computed  if  it  is  more  efficient  to  do  so.  If  no  variables  have  been  inserted  into 
the  context  since  the  last  context  substitution  was  performed,  it  is  also  possible  to  simply  replace 
the  substitution  associated  with  the  largest  index  with  the  composition  of  the  old  substitution  and 
the  new  substitution,  avoiding  the  addition  of  a  new  substitution  node.  Intuitively,  this  corresponds 
to  eagerly  applying  the  context  composition  structural  rule. 
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This  algorithm  provides  a  straightforward  implementation  of  explicit  substitution  nodes  in  a 
typing  context  without  requiring  any  modification  to  the  underlying  data  structure.  There  are  ad¬ 
ditional  improvements  possible  to  the  algorithm.  For  example,  in  the  static  semantics  as  presented 
above,  it  is  possible  to  (non-deterministically)  choose  to  apply  the  context  re-ordering  rules  for 
a  variable  at  the  first  point  in  a  derivation  past  which  more  than  one  immediate  sub-derivation 
contains  a  lookup  of  that  variable.  In  this  way,  the  application  of  any  given  substitution  to  the 
type  associated  with  a  variable  can  be  guaranteed  to  be  performed  at  most  once.  The  algorithm 
as  implemented  does  not  implement  this:  the  substitution  is  re-applied  to  the  original  type  upon 
every  variable  look  up.  In  practice  this  does  not  seem  to  be  a  problem.  However,  if  after  further 
engineering  this  became  a  bottleneck,  it  would  be  straightforward  to  memoize  the  application  of 
substitutions  to  ensure  that  no  unnecessary  traversals  of  types  is  done. 

9.7  Compilation  units 

The  definition  of  Standard  ML  [MTHM97]  does  not  define  a  notion  of  separate  compilation.  In 
order  to  remedy  this,  the  TILT  compiler  defines  a  notion  of  interface  that  generalizes  Standard 
ML  signatures,  and  uses  these  interfaces  to  mediate  between  separately  compiled  source  units.  Any 
unit  of  source  code  may  be  compiled  in  complete  isolation  from  any  other  unit  upon  which  it  relies, 
so  long  as  suitable  interfaces  are  provided. 

The  certifying  backend  described  in  this  dissertation  implements  the  full  TILT  separate  compi¬ 
lation  system.  This  is  done  by  viewing  each  compilation  unit  as  a  functor  which  maps  its  imported 
units  to  its  exports.  In  this  way,  the  linking  process  is  modeled  as  function  application,  avoiding 
the  need  for  a  complicated  type  theory  to  track  the  initialization  of  globally  visible  locations. 

More  concretely,  a  LIL  compilation  unit  may  be  thought  of  as  a  pair  consisting  of  a  type 
component  and  a  term  component.  The  type  component  is  a  type  function  mapping  imported 
types  to  exported  types.  Similarly,  the  term  component  is  a  function  mapping  imported  terms 
to  exported  terms.  In  actuality,  the  type  portion  is  implemented  not  as  a  function,  but  instead 
by  using  the  TALx86  linker  directly:  each  unit  lists  imported  types  with  their  kinds  as  explicit 
imports  instead  of  parameterizing  the  components  over  them. 

A  LIL  interface  classifying  such  a  compilation  unit  is  a  translucent  sum:  the  first  component 
of  which  is  a  kind  classifying  the  type  element  of  the  compilation  unit,  and  the  second  component 
of  which  is  a  type  classifying  the  term  element  of  the  compilation  unit.  The  pair  is  dependent  in 
the  usual  manner:  the  term  classifier  portion  may  refer  to  the  label  of  the  type  classifier  portion. 

The  TALx86  linker  as  defined  and  implemented  by  Glew  et  al.  [GM99]  does  not  in  fact  sup¬ 
port  translucent  sums,  and  so  it  was  necessary  to  extend  the  TALx86  implementation  with  an 
alternative  (and  simpler)  notion  of  typed  linking  which  implements  the  standard  translucent  sum 
matching  rules.  This  was  the  only  significant  change  needed  in  the  TALx86  infrastructure. 

9.7.1  Compiler  generated  files 

The  LIL  backend  emits  four  files  for  every  compilation  unit.  The  first  is  a  typed  assembly  file 
named  asm.tal  containing  the  decorated  assembly  code  for  the  unit:  this  corresponds  to  a  LIL 
compilation  unit  as  described  above.  The  interface  of  a  compilation  unit  (whether  explicit  or 
derived)  is  compiled  to  a  term  export  file  named  asm_e.tali  classifying  the  exported  term  portion 
of  the  unit,  and  a  type  export  file  named  tali  classifying  the  exported  type  portion.  Finally,  an 
additional  file  is  emitted  containing  the  signatures  of  any  external  G  functions  used  by  the  unit. 
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The  TALx86  assembler  type  checks  and  assembles  the  typed  assembly  file  (asm.tal)  to  produce 
two  object  files:  a  standard  object  file  (obj.o)  and  a  type  object  file  (obj.to)  containing  the  type 
annotations  necessary  for  typechecking  the  object  file.  In  the  process,  it  verifies  that  the  provided 
code  matches  the  interfaces  specified  in  the  asm_e.tali  and  tali  files. 

After  compiling  each  individual  unit  in  a  program,  TILT  must  also  produce  an  additional  unit 
to  serve  as  the  “link”  unit  for  the  entire  program:  that  is,  a  unit  which  applies  each  compilation 
unit  function  to  its  imports  to  produce  its  exported  result.  This  exported  result  is  then  passed  on  to 
subsequent  units  which  import  it.  This  “link”  unit  implements  the  ML  level  linking.  The  TALx86 
linker  is  used  to  typecheck  and  construct  the  final  executable  program,  linking  together  the  “link” 
unit  with  each  of  the  compilation  units  providing  the  individual  unit  initialization  functions. 


9.8  Measurements 

The  goal  of  this  dissertation  is  to  demonstrate  the  feasibility  of  certifying  compilation  in  a  type 
analysis  framework.  The  bulk  of  the  dissertation  is  concerned  with  simply  developing  and  describing 
the  framework  necessary  for  this  process.  This  provides  the  first  argument  for  feasibility:  that  it 
is  possible  at  all.  This  section  is  intended  to  provide  some  evidence  that  not  only  is  it  possible, 
but  in  fact  that  it  is  practical.  In  the  following  sections,  I  present  some  empirical  measurements  of 
the  performance  of  certifying  TILT.  In  particular,  I  present  measurements  quantifying  the  size  of 
the  generated  type  annotations  on  the  emitted  code,  and  measurements  of  the  run  time  of  various 
benchmark  programs  with  comparisons  to  other  compilers. 

It  is  important  to  re-iterate  here  that  engineering  the  compiler  to  improve  its  behavior  along 
either  of  these  axes  is  beyond  the  scope  of  this  thesis.  In  both  cases,  the  engineering  goal  was 
simply  to  develop  a  working  system,  and  to  measure  the  result.  No  significant  effort  was  spent  on 
improving  the  system  based  on  these  measurements,  and  inspection  of  the  emitted  code  suggests 
that  substantial  improvements  along  both  of  these  axes  could  be  implemented  without  running  into 
any  fundamental  limitations  of  the  framework. 

9.8.1  Benchmarks 

The  benchmark  programs  measured  in  the  following  sections  were  selected  to  be  representative  of 
a  number  of  different  sorts  of  programs,  ranging  in  size  from  24  to  more  than  2000  lines  of  code. 
Each  benchmark  was  compiled  separately  to  an  object  file,  and  then  subsequently  linked  into  a 
testing  harness  from  which  it  was  run.  The  benchmarks  were  also  linked  against  the  full  Standard 
ML  Basis  Library  which  provides  many  of  the  basic  data  types  for  Standard  ML,  along  with  access 
to  system  I/O  facilities.  Several  of  the  benchmarks  use  additional  data  structure  implementations 
from  the  SML/NJ  library.  Figure  9.3  lists  the  benchmark  programs  used  in  this  section,  along  with 
their  sizes  (in  lines  of  code).  Note  that  the  size  given  is  for  each  single  benchmark  file  only:  code 
from  other  compilation  units  (such  as  libraries  and  the  test  harness)  are  not  included  in  this  count. 

9.8.2  Type  size 

As  discussed  in  chapter  1.4,  one  of  the  most  commonly  cited  applications  for  certifying  compilation 
is  as  a  security  mechanism  for  mobile  code.  Certified  code  that  is  downloaded  to  be  run  from 
potentially  un-trusted  sources  (or  over  un-trusted  communication  channels)  can  be  checked  for  any 
violations  of  the  safety  properties  implied  by  the  type  system.  An  important  property  of  a  such 
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Benchmark  name 

Description 

Lines 

Taku 

uncurried  function  calls 

24 

Take 

curried  function  calls 

25 

Fib 

fib,  fact  with  default  Int 

38 

Fib32 

fib,  fact  with  32  bit  ints 

39 

PI 

approximation  of  pi  (fp) 

28 

Msort 

Merge  sort  (lists) 

48 

ISort 

Insertion  sort  (lists) 

50 

Quick 

Quicksort  (version  1) 

130 

Quick2 

Quicksort  (version  2) 

152 

Life 

Game  of  life  (lists) 

205 

FFT 

Fast  fourier  transform 

271 

P  Queens 

P  queens  problem  (arrays) 

292 

Frank 

Small  theorem  prover 

473 

Leroy 

Knuth  bendix  completion  (exceptions) 

537 

BarnesHut 

N-body  simulation 

684 

Simple 

Spherical  fluid  dynamics 

860 

Tyan 

Grobner  basis  calculation 

896 

Boyer 

Theorem  proving 

959 

Lexgen 

Lexical-analyzer  generator 

1178 

Pia 

Perspective  inversion  algorithm 

2074 

Figure  9.3:  Benchmarks 
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Figure  9.4:  Size  breakdown  of  individual  benchmark  units,  in  kilobytes. 


a  certification  system  is  that  the  certificate  size  should  not  be  unduly  large  so  that  the  additional 
security  provided  does  not  come  at  a  prohibitive  cost  in  terms  of  bandwidth  needed.  In  this  section, 
I  provide  measurements  quantifying  the  size  overhead  of  the  certificates  generated  by  TILT. 

In  section  9.7.1  I  described  the  files  generated  by  the  TILT  compiler  and  the  TALx86  assem¬ 
bler.  At  the  assembly  code  level,  there  are  notionally  two  elements  of  a  TILT  compiled  binary:  a 
typed  assembly  file  containing  the  actual  assembly  code  annotated  with  type  information  (asm.tal), 
and  some  additional  files  describing  the  type  of  the  exported  interface  provided  by  the  binary  (the 
asm_e.tali,  asmj.tali,  and  tali  files).  The  typed  assembly  file  refers  to  the  exported  interface  files 
of  any  units  which  it  uses,  and  any  units  which  use  it  will  in  turn  refer  to  its  exported  interface  files. 
Interface  files  mediate  between  compilation  units,  while  the  type  annotations  within  an  assembly 
file  allow  individual  units  to  be  checked. 

The  typed  assembly  file  is  further  split  by  the  assembler  into  a  standard  untyped  object  file 
(obj.o)  and  a  type  annotation  file  (obj.to)  that  contains  sufficient  information  to  reconstruct  the 
annotations  on  the  untyped  object  file. 

A  reasonable  measurement  of  the  certificate  size  for  a  compilation  unit  after  assembly  then  is 
the  sum  of  the  sizes  of  its  export  interface  files  (the  asm_e.tali  ,  asmj.tali,  and  tali  files)  and 
the  type  annotation  file  generated  by  the  assembler  (obj.to),  since  these  represent  the  incremental 
contribution  of  each  compilation  unit  to  the  overall  certificate  size  of  the  entire  compilation  sys¬ 
tem.  Note  that  all  of  the  interface  information  is  present  only  to  support  separate  compilation. 
Once  linked  together,  almost  all  of  the  interface  information  can  be  discarded:  in  this  sense  this 
measurement  is  an  upper  bound. 
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□  obj.o  Dobj-to  Dasm  e-tali  Basm  i-tali 


Figure  9.5:  Size  breakdown  of  individual  benchmark  units,  relative. 

9.8.3  Benchmark  unit  sizes 

In  this  section,  I  give  some  size  measurements  for  the  TILT  benchmark  suite.  Each  of  the  units 
under  consideration  is  a  single  compilation  unit  compiled  from  a  single  source  file  to  a  single  object 
file.  These  object  files  are  linked  together  with  libraries,  a  test  harness  and  the  link  unit  to  produce 
the  final  executable.  Note  that  only  the  benchmark  units  themselves  are  included  in  this  section. 
Measurements  of  the  libraries  and  the  link  unit  are  discussed  in  the  next  section. 

Figure  9.4  gives  the  absolute  size  in  kilobytes  of  each  benchmark.  The  columns  are  sorted  by 
increasing  size  of  the  generated  (untyped)  object  file.  The  segments  of  each  column  indicate  the 
contribution  of  each  of  the  different  files  making  up  the  compilation  unit.  The  size  overhead  of  the 
safety  certificate  is  everything  other  than  the  object  file  itself  (the  bottom  segment).  Note  that  in 
all  cases,  the  contribution  of  the  asmJ.tali  file  (containing  the  types  of  imported  C  functions)  is 
too  small  to  be  visible. 

Figure  9.5  provides  a  different  view  of  this  same  data:  in  this  figure  each  segment  of  each  column 
indicates  a  percentage  of  the  total  size  of  the  unit  contributed  by  a  particular  source.  For  very  small 
programs,  the  amount  of  type  information  dominates  the  object  code  size.  For  larger  programs, 
the  percentage  of  the  total  space  usage  devoted  to  the  certificate  decreases.  This  reflects  a  certain 
amount  of  fixed  overhead  required  for  each  program:  basic  types  such  as  exception  handlers,  types 
for  printing  routines,  etc.  As  programs  get  larger,  the  cost  of  this  fixed  overhead  goes  down.  For  the 
largest  of  the  benchmarks,  the  certificate  occupies  roughly  sixty  percent  of  the  total  size.  Summed 
over  all  of  the  benchmarks,  the  certificate  information  occupies  roughly  eighty  percent  of  the  total 
space,  indicating  roughly  a  factor  of  five  space  penalty  for  certification. 
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Export  interfaces 

The  topmost  visible  segment  in  both  graphs  represents  the  contribution  to  the  total  size  of  the 
exported  interface  of  each  of  the  object  files.  Generally  speaking,  this  component  is  of  comparable 
size  to  the  type  information  in  the  object  file  itself.  As  would  be  expected,  for  bigger  programs  the 
percentage  overhead  of  the  exported  interface  decreases  somewhat,  since  the  overhead  is  amortized 
over  a  larger  amount  of  actual  object  code.  In  fact,  this  overhead  is  probably  almost  entirely 
eliminable,  for  three  reasons. 

First,  by  the  nature  of  the  compilation  approach  taken  in  the  LIL  backend,  each  of  these  export 
interfaces  describes  a  single  function  mapping  each  of  the  object  files  logical  imports  to  its  logical 
exports.  Consequently,  the  size  of  this  file  increases  significantly  with  the  number  of  imported  units. 
In  addition,  this  implies  that  almost  all  of  the  type  decorations  in  an  export  interface  file  are  already 
present  in  the  export  interface  files  of  its  ancestors.  It  is  very  likely  that  this  redundant  information 
could  be  eliminated  entirely  by  exporting  a  single  canonical  abbreviation  for  the  exported  type  of 
each  unit  along  with  its  exported  type  component. 

Second,  because  TILT  implements  separate  compilation,  the  interface  file  and  the  actual  imple¬ 
mentation  files  may  be  produced  and  used  independently.  As  a  consequence,  each  contains  its  own 
copy  of  the  type  of  the  main  exported  initialization  function  (in  fact,  the  entire  export  interface  file 
consists  solely  of  this).  It  is  clear  that  in  the  common  case  where  both  files  are  produced  as  part  of 
the  same  compilation,  this  redundancy  could  be  factored  out  into  a  common  definition  site.  This 
is  almost  certainly  expressible  in  the  TALx86  linking  system  without  modification. 

Finally,  note  that  these  export  interfaces  exist  solely  to  mediate  between  compilation  units. 
Linked  as  a  whole  program,  all  of  this  interface  disappears.  Moreover,  in  many  cases  (such  as  in 
the  benchmark  suite),  almost  none  of  the  entry  points  described  by  these  interfaces  are  exported 
from  the  local  group  of  compilation  units.  Even  in  the  Standard  ML  Basis  Library,  there  are  many 
compilation  units  whose  logical  scope  is  entirely  local  to  the  library.  A  partial  linking  strategy 
wherein  groups  of  compilation  units  are  closed  up  into  a  single  file  presenting  a  single  export 
interface  would  almost  certainly  eliminate  a  great  deal  of  this  overhead  in  most  cases,  even  when 
whole  program  linking  is  not  possible. 

9.8.4  Libraries  and  linking 

The  measurements  presented  in  figures  9.4  and  9.5  cover  only  the  benchmark  programs  and  the 
testing  harness.  These  files  must  be  linked  against  additional  libraries  before  running:  the  Standard 
ML  Basis  Library,  the  SML/NJ  Library,  and  a  small  command  line  argument  parsing  library.  It  is 
likely  that  in  a  certified  compilation  system,  a  subset  of  these  libraries  would  be  provided  by  the 
client,  rather  than  as  part  of  the  certified  binary.  This  is  particularly  true  for  the  Standard  ML 
Basis  Library,  which  encapsulates  the  operating  system  functionality.  Nonetheless,  it  is  useful  to 
measure  the  behavior  of  the  compiler  on  these  libraries  as  well  as  part  of  the  overall  system. 

In  addition  to  the  libraries,  there  is  one  additional  compilation  unit  that  makes  up  part  of  the 
final  executable  program.  As  discussed  previously,  the  compilation  strategy  employed  in  TILT 
maps  each  compilation  unit  to  a  single  entry  point  implementing  a  function  which  takes  as  argu¬ 
ments  the  logical  imports  of  the  unit,  and  computes  the  logical  exports  as  a  result.  The  final  step 
in  compilation  then  is  to  create  a  unit  which  stitches  together  the  whole  program  at  runtime  by 
applying  each  of  these  functions  in  turn.  I  refer  to  this  unit  as  the  link  unit,  since  it  implements 
the  logical  linking  of  the  program.  Note  that  this  is  distinct  from  the  TALx86  notion  of  linking. 
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Figure  9.6:  Total  sizes  of  compilation  groups  (kilobytes). 


which  is  used  to  resolve  the  free  references  within  the  link  unit  back  to  the  exported  entry  points 
of  each  compilation  unit. 

Figure  9.6  compares  the  absolute  sizes  in  kilobytes  of  each  of  the  major  compilation  groups  of 
the  benchmark  suite:  the  three  libraries  (the  Basis,  the  SML/NJ  Library,  and  the  command  line 
argument  library),  the  link  unit,  and  the  benchmarks  themselves  (including  the  testing  harness). 
As  before,  each  column  is  broken  into  segments  showing  the  contribution  to  the  total  of  each  of  the 
constituent  files.  Also  as  before,  figure  9.7  presents  the  same  data  as  a  percentage  of  the  total  for 
each  group. 

The  Basis  Library 

There  are  several  interesting  points  to  note  here.  Firstly,  it  is  clear  that  the  Basis  library  (and  to  a 
lesser  extent  the  SML/NJ  library)  have  more  certificate  overhead  than  the  benchmark  suite.  (This 
is  also  true  of  the  argument  library,  but  this  is  most  likely  because  of  its  very  small  size.)  While 
the  Basis  and  the  benchmarks  viewed  as  a  whole  have  similar  amounts  of  actual  object  data,  the 
certificate  size  for  the  Basis  is  larger  by  roughly  a  factor  of  four.  While  I  have  not  investigated  this 
phenomenon  in  detail,  I  conjecture  that  it  is  likely  that  this  is  partially  a  result  of  the  particular 
architecture  of  the  Basis  library.  Within  the  Basis,  there  are  numerous  units  which  consist  solely 
of  a  few  (or  even  one)  functor  applications,  or  which  contain  structures  which  aggregate  numerous 
other  structures  together  as  sub-structures.  Such  units  produce  almost  no  object  code,  but  have 
quite  large  types. 

Further  evidence  for  this  can  be  seen  in  figure  9.8,  which  presents  the  contributions  of  each  of 
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Figure  9.8:  Breakdown  of  certificate  overhead  across  individual  Basis  units  (kilobytes) 
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the  constituent  elements  for  each  compilation  unit  in  the  Basis  library.  For  clarity  given  the  large 
number  of  files,  the  graph  is  presented  as  a  continuous  area  chart.  As  with  the  previous  graphs, 
the  units  are  sorted  by  object  file  size.  For  the  most  part,  the  certificate  size  (the  sum  of  the  top 
components)  increases  fairly  smoothly  with  object  file  size.  However,  there  are  a  number  of  very 
substantial  spikes  at  points  in  the  graph  where  very  small  files  generate  certificates  comparable  to 
those  of  the  largest  files  in  the  library.  All  of  these  spikes  that  I  have  examined  in  detail  correspond 
to  the  sort  of  aggregating  files  discussed  above.  Since  for  such  files,  much  of  the  type  information 
will  have  already  been  written  out  in  the  export  files  of  the  logically  imported  compilation  units, 
it  is  likely  that  much  of  this  overhead  could  be  eliminated,  even  in  a  separate  compilation  setting 
(since  the  type  decorations  needed  to  describe  the  logical  imports  must  be  present  in  the  interface 
file,  which  is  in  turn  needed  for  separate  compilation).  However,  in  the  current  framework  these 
small  units  with  large  types  increase  the  aggregate  certificate  overhead  substantially. 

The  link  unit 

A  second  interesting  observation  about  figures  9.6  and  9.7  is  that  the  size  of  the  link  unit  is  entirely 
dominated  by  the  certificate  size.  The  actual  object  code  for  the  link  unit  makes  up  less  than  one 
percent  of  the  total  size,  and  the  total  size  itself  is  substantial  in  absolute  terms  (significantly  larger 
than  all  of  the  benchmarks  put  together).  In  principle  this  is  actually  somewhat  understandable. 
The  link  unit  refers  to  every  compilation  unit  in  the  entire  program,  including  all  of  the  libraries. 
It  therefore  must  be  able  to  refer  to  the  type  of  the  exported  entry  point  (and  its  result)  for  each 
compilation  unit.  In  some  sense  then,  it  is  not  surprising  that  the  size  of  the  certihcate  for  the  link 
unit  should  be  comparable  to  the  sum  of  the  sizes  of  the  export  interfaces  for  all  of  the  units  in  the 
program. 

However,  all  of  the  type  information  needed  to  describe  the  entry  points  of  the  compilation 
units  must  be  present  in  the  export  interfaces  of  the  compilation  units  themselves.  And  since  these 
export  interfaces  are  needed  by  the  link  unit,  there  is  no  reason  that  the  link  unit  needs  to  contain 
its  own  copy  of  these  types.  In  fact,  the  typed  assembly  code  produced  by  the  LIL  backend  takes 
advantage  of  this  property  by  using  the  abbreviation  mechanism  provided  by  the  TALx86  system. 
The  link  unit  code  generated  by  the  LIL  backend  simply  refers  to  the  type  of  a  compilation  unit  by 
using  a  canonical  abbreviation  name  that  is  given  a  definition  by  the  export  interface  of  the  unit. 
The  assembly  code  produced  by  the  LIL  backend  for  the  link  unit  is  consequently  quite  compact. 
Surprisingly,  the  TALx86  assembler  seems  unable  to  preserve  this  compact  representation. 

Figure  9.9  compares  the  aggregate  sizes  of  the  compilation  groups  before  and  after  assembly.  For 
each  compilation  group,  the  first  column  represents  the  total  size  of  the  assembly  (asm.tal)  files  for 
the  unit,  and  the  second  column  represents  the  sum  of  the  object  and  typed  object  files  (obj.o  and 
obj.to).  Note  that  the  files  from  these  two  columns  represent  the  same  information.  The  object 
and  type  object  hies  (obj.o  and  obj.to)  are  produced  by  assembly  the  assembly  hie  (asm.tal), 
and  can  be  subsequently  dis-assembled  to  re-produce  the  original  assembly  representation. 

In  general,  the  TALx86  assembler  does  a  very  good  job  of  managing  certihcate  sizes.  In  all  cases 
except  the  link  unit,  the  assembled  version  is  noticeably  smaller  than  the  original  assembly  hies. 
This  is  universally  true  for  individual  hies  within  the  compilation  groups,  as  well  as  in  aggregate. 
The  only  exception  to  this  is  the  link  unit:  in  this  case  the  assembled  version  is  almost  forty  times 
the  size  of  the  original  assembly  hie  (even  though  notionally  they  represent  the  same  information) . 

While  I  have  not  investigated  this  in  detail,  I  believe  that  this  is  almost  certainly  because 
of  a  failure  of  the  TALx86  assembler  to  preserve  the  sharing  from  the  original  assembly  hie.  I 
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Figure  9.9:  Assembly  file  size  vs  assembled  output  size  (kilobytes) 


conjecture  that  the  references  to  abbreviated  types  from  imported  units  in  the  original  assembly 
file  are  being  expanded  out  in  the  type  object  file.  I  do  not  believe  that  there  is  anything  inherent 
to  the  structure  of  the  link  unit  that  prevents  this  sharing  from  being  maintained:  it  should  be 
possible  to  engineer  an  assembler  to  preserve  this  information. 


Scalability 

The  data  in  figure  9.5  suggest  strongly  that  the  certificate  overhead  scales  well  as  the  size  of 
compilation  units  increase.  For  larger  units,  the  overall  percentage  overhead  is  substantially  smaller 
than  the  overhead  for  smaller  units.  In  order  to  demonstrate  the  scalability  of  the  system  further, 
I  took  additional  measurements  on  two  very  large  programs. 

The  first  of  these  large  programs  consisted  of  a  subset  of  the  benchmarks  along  with  all  of 
the  library  code  upon  which  they  rely,  concatenated  into  a  single  file.  In  addition  to  the  SML/NJ 
libraries,  this  included  a  large  portion  of  the  Standard  Basis  Library  as  well.  Unfortunately,  because 
of  a  limitation  with  the  TILT  foreign  function  interface,  it  was  not  possible  to  concatenate  the 
entire  Standard  Basis  Library  into  a  single  valid  Standard  ML  unit.  Therefore,  certain  of  the 
benchmarks  (such  as  those  performing  file  i/o)  could  not  be  included.  The  resulting  file  consisted 
of  8332  lines  of  code. 

The  second  of  these  large  programs  consisted  of  approximately  half  of  the  TILT  compiler  (by 
lines  of  code)  concatenated  into  a  single  file  and  compiled  as  a  whole  program.  Note  that  in  this 
case,  all  library  code  (including  the  Standard  Basis  Library)  was  compiled  separately.  The  resulting 
source  file  consisted  of  32555  lines  of  code! 

The  absolute  size  (in  kilobytes)  of  the  generated  type  and  object  files  is  given  in  figure  9.10. 
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The  same  data  is  plotted  in  figure  9.11  as  a  percentage  of  the  total,  with  comparisons  to  the  totals 
for  the  separately  compiled  libraries  and  benchmarks  as  shown  in  figure  9.7.  For  both  of  these 
large  programs,  the  certificate  overhead  is  substantially  smaller  than  the  aggregate  totals  for  the 
other  compilation  groups,  representing  approximately  60%  of  the  total  size.  This  is  comparable 
to  the  overhead  for  the  largest  of  the  benchmark  units  described  in  figure  9.5.  These  two  data 
points  combined  with  figure  9.5  strongly  suggest  that  while  there  is  a  noticeable  initial  overhead 
for  certification,  the  certificate  size  remains  a  relatively  constant  fraction  of  the  overall  size  for  a 
large  range  of  program  sizes.  That  is  to  say,  empirically  speaking  the  certificate  size  seems  to  grow 
linearly  with  the  program  size. 

9.8.5  Run  time 

The  LIL  backend  is  not  designed  to  be  an  optimizing  backend.  Code  generation  and  register 
allocation  are  both  done  in  a  single  linear  pass,  and  no  optimizations  are  done  after  code  generation. 
Nonetheless,  it  is  useful  to  measure  the  runtime  performance  of  the  compiled  code.  It  is  important 
that  certification  not  overly  restrict  the  compilation  process  in  such  a  way  as  to  make  efficient  code 
generation  impossible.  In  this  section,  I  argue  that  even  a  non-optimizing  certifying  backend  can 
produce  reasonably  efficient  code. 

In  order  to  make  this  argument,  I  present  comparisons  with  two  other  compilers:  the  Standard 
ML  of  New  Jersey  compiler  and  the  MLton  compiler.  These  comparisons  are  designed  to  give  some 
indication  of  the  relative  performance  of  the  certifying  TILT  backend  with  respect  to  the  state  of 
the  art  in  Standard  ML  compilers.  It  is  important  to  note  however  that  each  of  these  compilers  is 
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fundamentally  addressing  a  very  different  compilation  problem  from  each  of  the  others.  For  this 
reason,  this  comparison  is  only  really  meaningful  at  the  most  general  high  level. 

The  MLton  compiler  is  designed  to  produce  very  efficient  code.  In  order  to  do  this,  it  compiles 
only  complete  programs.  The  Standard  ML  of  NJ  compiler  is  designed  to  be  used  as  an  interactive 
system.  As  such,  it  supports  incremental  (but  not  separate)  compilation.  It  performs  significant 
optimization  as  well,  but  is  limited  somewhat  by  the  needs  of  incremental  compilation  and  an 
interactive  frontend.  Both  of  these  compilers  implement  precise  garbage  collection. 

The  TILT  compiler  on  the  other  hand  implements  true  separate  compilation.  This  greatly 
limits  its  ability  to  optimize  code  that  crosses  compilation  units.  The  TALx86  backend  also  uses 
a  conservative  garbage  collector,  with  a  malloc  based  allocator. 
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Figure  9.12  shows  the  relative  performance  of  the  compiled  versions  of  the  TILT  benchmarks  as 
compiled  by  each  of  these  three  compilers.  The  results  are  normalized  to  the  results  for  the  MLton 
compiler,  which  is  generally  the  fastest.  In  order  to  get  some  sense  of  the  overhead  of  seperate 
compilation,  some  of  the  benchmarks  were  compiled  by  TILT  as  whole  programs  as  well  (e.g.  with 
all  of  the  library  code  and  the  Standard  Basis  Library  included  into  one  file).  This  data  is  presented 
in  the  second  column  labelled  TILT(Whole).  Because  of  limitations  in  the  TILT  foreign  function 
interface  however,  none  of  the  benchmarks  requiring  i/o  could  be  compiled  as  whole  programs  and 
so  certain  of  the  data  points  are  missing. 

Overall,  the  certifying  backend  ranges  from  slightly  faster  than  MLton  to  almost  sixteen  times 
slower,  and  from  approximately  twice  as  fast  as  SML/NJ  to  seven  times  slower.  For  very  small 
benchmarks,  such  as  the  arithmetic  benchmarks  and  the  takc/taku  benchmarks,  TILT  does  quite 
well.  For  small  benchmarks  such  as  these,  there  is  no  penalty  for  the  inability  to  optimize  across 
compilation  units,  and  the  relatively  simple  register  allocation  in  TILT  is  sufficient.  The  one 
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exception  to  this  is  the  pi  benchmark,  on  which  TILT  does  quite  poorly.  This  is  almost  certainly 
because  of  the  lack  of  floating  point  register  allocation  in  TILT,  since  this  benchmark  is  dominated 
by  floating  point  operations. 

The  performance  of  the  larger  benchmarks  is  much  more  varied.  In  some  cases  (tyan,  lexgen, 
pqueens)  TILT  is  within  a  factor  of  two  of  one  or  both  of  the  other  compilers.  In  the  worst  case 
(leroy)  TILT  is  almost  sixteen  times  slower  than  MLton,  and  five  times  slower  than  SML/NJ.  I 
conjecture  that  part  of  this  is  likely  due  to  separate  compilation:  all  of  the  larger  benchmarks  cross 
compilation  unit  boundaries  a  fair  bit,  via  library  calls.  In  addition,  several  of  the  benchmarks 
involve  floating  point  computation,  and  several  of  them  are  fairly  allocation  intensive.  In  both  of 
these  areas,  TILT  is  likely  to  suffer:  from  the  lack  of  floating  point  register  allocation  in  the  first 
case,  and  from  a  more  expensive  memory  allocation  strategy  in  the  second  case. 

The  whole  program  version  of  each  TILT  compiled  benchmark  is  noticeably  faster  than  the 
separately  compiled  version:  in  one  particular  case  (PQueens),  faster  by  a  factor  of  approximately 
six.  This  is  despite  the  fact  that  all  of  the  TILT  optimizations  assume  a  separate  compilation 
setting  even  when  given  whole  programs. 
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Chapter  10 

Conclusions  and  future  work 


10.1  Summary 

In  this  dissertation  I  have  shown  that  certified  compilation  is  possible  for  full  Standard  ML,  even 
in  the  presence  of  type  analysis  based  optimizations. 

10.1.1  Theory 

To  provide  a  theoretical  foundation  for  this,  I  defined  a  series  of  translations  mapping  a  polymorphi- 
cally  typed  lambda  calculus  extended  with  type  analysis  primitives  to  a  typed  assembly  language. 
These  translations  serve  to  make  type  analysis  explicit  in  the  intermediate  representation;  to  trans¬ 
late  uses  of  functions  to  uses  of  closures;  and  to  make  control  flow  and  machine  state  explicit  in  an 
abstract  assembly  language. 

I  proved  each  of  these  translations  sound  in  the  sense  that  each  translation  maps  well-typed 
terms  to  well-typed  terms.  In  order  to  avoid  overly  constraining  the  implementation,  I  also  in¬ 
troduced  a  novel  approach  for  dealing  with  register  allocation,  allowing  the  typing  assumptions 
required  for  the  proof  of  soundness  to  be  separated  from  the  semantic  behavior  of  the  register  allo¬ 
cator.  In  this  way,  the  translation  to  assembly  code  remains  parametric  over  the  choice  of  register 
allocation  algorithm,  so  long  as  the  algorithm  chosen  satisfies  certain  minimal  typing  restrictions. 

10.1.2  Practice 

In  order  to  validate  the  practicality  of  my  approach,  I  implemented  a  certifying  backend  for  the 
TILT  compiler  using  the  formal  translation  as  a  guide. 

The  TILT  middle  internal  language  corresponds  closely  to  the  polymorphically  typed  lambda 
calculus  used  as  a  starting  point  for  the  formal  compilation  process  described  in  this  dissertation. 
For  each  of  the  compiled  passes  described  formally  in  this  dissertation,  I  implemented  a  corre¬ 
sponding  compiler  pass  in  the  TILT  compiler.  In  addition,  I  implemented  a  one  pass  optimizer 
to  perform  simple  optimizations  to  take  advantage  of  the  additional  program  structure  exposed  by 
the  new  translations.  The  final  target  of  this  backend  is  a  slightly  modified  version  of  the  TALx86 
framework.  The  final  translation  to  the  TALx86  language  follows  closely  the  format  of  the  formal 
translation  in  this  dissertation,  despite  the  significant  syntactic  differences  between  the  idealized 
typed  assembly  language  used  in  the  formal  presentation  and  the  TALx86  language. 
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Separate  compilation  of  full  Standard  ML  is  supported  by  this  new  backend,  and  a  large  number 
of  programs  have  been  successfully  compiled  including  the  entire  Standard  ML  Basis  Library  and 
the  Standard  ML  of  New  Jersey  library.  In  addition  I  compiled  the  TILT  benchmark  suite  using  this 
typed  backend,  and  measured  the  results  of  compilation  both  in  terms  of  performance  as  compared 
to  other  Standard  ML  compilers,  and  in  terms  of  certificate  size  overhead  in  the  compiled  binaries. 


10.1.3  Conclusions 

Certified  compilation  is  possible  even  for  languages  as  rich  as  Standard  ML,  even  in  the  presence 
of  complex  type  based  optimizations.  Proving  the  soundness  of  typed  compiler  translations  in  this 
setting  is  feasible.  However,  there  is  a  trade  off  between  the  closeness  of  the  formal  model  to  the 
implementation  and  the  complexity  of  the  proof  process.  In  this  dissertation  I  attempted  to  keep 
a  very  close  correspondence  between  the  formal  model  and  the  implementation. 

In  order  to  make  this  more  feasible  I  introduced  new  techniques  for  factoring  out  some  inessential 
choices  of  the  implementation,  such  as  the  particular  choice  of  mappings  of  variables  to  registers 
and  the  method  by  which  this  mapping  is  arrived  at.  While  this  particular  technique  should  scale 
to  more  complex  optimizing  code  generation,  it  is  likely  that  maintaining  the  close  correspondence 
between  the  formal  model  and  the  implementation  would  become  more  difficult  if  more  complex 
code  generation  and  optimization  techniques  were  performed. 

The  problem  of  managing  certificate  size  is  shown  here  to  be  manageable.  With  no  major 
engineering  or  tuning  of  the  new  TILT  back  end,  certificate  size  overhead  was  shown  to  be  in 
general  quite  reasonable.  While  the  overhead  for  supporting  separate  compilation  makes  up  almost 
half  of  the  aggregate  certificate  size  for  the  measured  programs,  simple  analysis  of  the  structure  of 
compilation  units  shows  that  much  of  this  overhead  could  be  eliminated  with  a  more  sophisticated 
mechanism  for  sharing  type  abbreviations  across  interfaces.  Numerous  other  opportunities  exist 
for  eliminating  redundant  type  information,  both  between  units  and  within  units. 

In  the  one  case  where  the  certificate  overhead  was  observed  to  be  drastically  larger  than  expected 
(the  link  unit  discussed  in  the  previous  section) ,  this  was  shown  to  be  due  to  a  failure  of  the  TALx86 
assembler  (which  otherwise  performed  admirably)  rather  than  a  structural  failure  of  the  compiler. 

10.1.4  Compiler  engineering 

Maintaining  type  information  on  the  compiler  intermediate  forms  imposes  an  additional  burden 
on  a  compiler  writer,  just  as  programming  in  a  typed  language  imposes  an  additional  burden  on 
a  programmer.  As  is  the  case  with  type  safe  programming  languages  however,  it  is  becoming 
increasingly  clear  that  the  engineering  benefits  of  typed  compilation  substantially  out-weigh  the 
costs.  A  type  preserving  compiler  provides  a  form  of  automatic  self-checking  that  is  extremely 
valuable  to  the  implementer  of  the  compiler. 

Compiler  bugs  are  notoriously  difficult  to  track  down  and  fix,  since  they  often  exhibit  themselves 
only  as  second-order  effects.  That  is,  the  compiler  itself  appears  to  run  correctly:  it  is  only  in 
the  running  of  the  generated  code  that  incorrect  behavior  appears.  To  make  matters  worse,  the 
problem  in  the  generated  code  may  very  well  manifest  itself  not  at  the  incorrect  program  point,  but 
at  some  arbitrary  later  point  in  the  program’s  run  (e.g.  because  of  memory  or  stack  corruption). 
Matching  an  incorrect  behavior  of  a  generated  program  to  the  bug  in  the  compiler  that  caused  it 
often  requires  an  extensive  process  of  analysis  and  deduction.  By  and  large,  the  state  of  the  art  in 
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untyped  compiler  debugging  generally  relies  on  carefully  stepping  through  optimized  (and  hence 
often  obfuscated)  code  in  a  debugger. 

The  great  benefit  of  a  type-preserving  compiler  is  that  it  moves  the  point  of  discovery  for 
compiler  bugs  out  of  the  generated  code  and  into  the  compiler  itself,  for  a  large  class  of  bugs.  In 
other  words,  many  or  even  most  compiler  bugs  are  caught  and  flagged  as  soon  as  they  are  produced 
in  the  compiler,  rather  than  at  some  arbitrary  point  in  a  future  run  of  the  generated  program. 
Moreover,  if  a  self-check  is  run  between  every  compiler  pass  (at  least  while  in  development),  the 
point  at  which  the  error  is  flagged  indicates  not  only  the  location  of  the  error  in  the  intermediate 
code  but  also  the  particular  pass  of  the  compiler  that  is  the  most  likely  culprit  (i.e.  the  pass 
immediately  preceeding  the  failed  self-check). 

Of  course,  type  safety  does  not  guarantee  correctness.  It  is  still  possible  for  the  compiler  to 
generate  incorrect  code  that  nonetheless  happens  to  be  well-typed.  In  practice  however,  this  seems 
to  be  relatively  rare  -  most  errors  (and  most  of  the  most  pernicious  errors)  tend  to  be  caught.  In 
particular,  note  that  the  entire  class  of  errors  involving  memory  corruption  are  guaranteed  to  be 
caught  by  the  type  checker!  Over  the  course  of  developing  the  certifying  backend  for  TILT,  my 
experience  was  that  almost  all  compiler  bugs  were  caught  statically.  For  example,  while  developing 
the  register  allocator  (a  notorious  source  of  difficult  bugs),  all  of  the  register  allocation  bugs  that  I 
encountered  were  caught  by  the  TALx86  typechecker. 

Most  of  the  bugs  that  were  not  caught  by  the  typechecker  arose  from  the  more  permissive  nature 
of  the  TALx86  type  system  as  compared  to  the  Standard  ML  type  system.  For  example,  TALx86 
very  reasonably  defines  32  bit  integer  arithmetic  to  have  silent  overflow  semantics:  the  compiler 
is  responsible  for  generating  explicit  overflow  checks  and  raising  exceptions  as  appropriate.  If  the 
compiler  fails  to  emit  such  as  check,  the  code  is  still  well-typed  with  respect  to  the  TALx86  type 
system:  however,  its  behavior  on  overflow  is  incorrect  with  respect  to  the  semantics  of  Standard 
ML.  An  interesting  area  for  future  work  in  the  area  of  typed  assembly  languages  would  be  to  provide 
facilities  for  encoding  such  source  language  constraints  into  the  type  system  (without  specializing 
the  type  system  itself  to  a  particular  language) . 


10.2  Future  work 

This  dissertation  demonstrates  the  feasibility  of  performing  certified  compilation  for  Standard  ML 
in  a  type  analysis  framework.  There  remain  several  directions  in  which  this  work  could  be  extended 
in  order  to  make  this  more  useful  and  practical. 

10.2.1  Optimization 

The  certifying  backend  implemented  as  part  of  this  thesis  is  for  the  most  part  not  an  optimizing 
one.  While  some  simple  optimizations  are  performed  on  the  LIL  intermediate  code,  inspection 
of  the  output  of  the  compiler  suggests  numerous  ways  in  which  the  intermediate  code  could  be 
improved  (particularly  after  closure  conversion). 

Some  of  these  improvements  are  as  simple  as  extending  the  LIL  language  with  additional 
constructs.  A  simple  example  of  such  an  extension  is  that  of  heterogeneous  tuples,  which  would 
allow  the  closure  converter  to  avoid  boxing  and  un-boxing  64  bit  floating  point  numbers.  Others 
are  more  complex:  for  example,  implementing  partial  redundancy  elimination  to  re-locate  closure 
environment  operations  used  only  conditionally. 
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In  addition  to  improvements  in  the  optimization  of  the  intermediate  code,  there  is  considerable 
room  for  improvement  of  the  code  generator  itself.  The  register  allocation  technique  used  is  quite 
simplistic:  a  more  sophisticated  algorithm  based  on  graph  coloring  or  graph  fusion  would  produce 
significantly  better  code.  No  floating  point  register  allocation  is  currently  done  at  all. 

The  code  generator  itself  is  quite  limited.  No  attempt  is  made  at  scheduling  instructions 
intelligently,  and  the  instruction  selection  is  fairly  ad-hoc.  While  it  does  a  relatively  good  job 
of  taking  advantage  of  the  CISC  nature  of  the  x86  instruction  set,  a  more  uniform  approach  would 
almost  certainly  improve  the  generated  code. 

The  implementation  of  exceptions  in  the  LIL  backend  is  also  quite  inefficient,  both  in  setting 
new  handlers  and  in  executing  handler  bodies. 

Finally,  I  believe  there  is  a  good  deal  of  work  left  to  be  done  in  tuning  and  improving  the 
use  of  type  analysis  in  the  TILT  compiler.  No  tuning  has  been  done  to  adjust  the  parameters 
to  the  type  analysis  optimizations,  such  as  the  maximum  width  record  to  flatten  into  registers. 
Additionally,  no  work  has  been  done  to  quantify  the  actual  benefits  of  type  analysis  as  currently 
implemented.  It  would  be  valuable  to  measure  the  effect  of  these  optimizations  in  isolation,  and 
to  use  this  information  to  look  for  additional  opportunities  to  take  advantage  of  the  type  analysis 
infrastructure  already  in  place. 

10.2.2  Garbage  collection 

The  TALx86  infrastructure  assumes  the  use  of  a  conservative  garbage  collector.  The  untyped 
TILT  backend  makes  use  of  type  information  at  runtime  to  do  precise  garbage  collection.  An 
interesting  topic  of  future  research  would  be  to  replace  or  extend  the  TALx86  infrastructure  in 
such  a  way  as  to  support  precise  garbage  collection  using  the  type  information  already  kept  at 
runtime  [VC03]. 

10.2.3  Infrastructure  improvements 

The  TALx86  certification  infrastructure  proved  impressively  flexible  in  serving  as  a  target  for  the 
TILT  compiler  with  a  minimum  of  modification.  However,  the  performance  of  the  TALx86  type 
checker  could  be  improved  upon  substantially.  Currently,  assembling  and  typechecking  large  units 
takes  dramatically  longer  than  the  entire  process  of  compilation  within  TILT  (including  numerous 
internal  type  checks).  While  the  type  system  is  more  detailed  and  low  level  at  the  TALx86  level, 
many  of  the  techniques  discussed  in  chapter  9  used  to  improve  the  performance  of  the  LIL  type 
checker  are  still  applicable. 

In  addition,  as  discussed  in  section  9.8.4  there  are  a  few  important  cases  where  the  TALx86 
assembler  seems  to  fail  to  preserve  the  physical  sharing  present  in  the  original  typed  assembly 
source  file  which  degrade  its  performance  on  certain  units  immensely. 

10.2.4  Reducing  the  trusted  computing  base 

Another  concern  with  the  TALx86  infrastructure  is  that  there  is  no  proof  of  type  safety  for  the 
language  as  implemented.  This  is  of  significant  concern  in  an  actual  certifying  compilation  system, 
since  in  the  absence  of  such  a  proof,  even  the  correctness  of  the  TALx86  type  checker  does  not 
necessarily  imply  the  safety  of  the  programs  it  certifies.  And  of  course,  there  is  no  guarantee  that 
the  implemented  type  checker  faithfully  and  soundly  implements  the  static  semantics  of  the  type 
system  for  the  language. 
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Attempts  have  recently  been  made  to  address  both  of  these  problems  by  providing  certified 
code  infrastructures  that  both  implement  a  language  with  a  formal  proof  of  type  safety,  and  which 
try  to  reduce  the  complexity  of  the  checking  problem  as  much  as  possible  so  as  to  minimize  the 
amount  of  trusted  code  in  the  certifier  [Cra03,  AppOl].  A  valuable  area  of  future  research  would 
be  to  re-target  the  certifying  TILT  backend  to  such  a  system. 

10.3  Conclusions 

This  dissertation  has  clearly  shown  that  certifying  compilation  is  feasible  for  a  rich  language  like 
Standard  ML,  even  in  the  presence  of  type  analysis  and  other  other  advanced  optimizations.  The 
problem  of  controlling  the  intermediate  size  of  programs  and  of  controlling  the  certificate  overhead 
on  the  generated  code  is  almost  certainly  tractable.  Even  in  the  system  implemented  in  this  thesis, 
which  made  no  effort  to  optimize  for  these  properties  beyond  the  minimum  necessary  to  demonstrate 
feasibility,  the  results  were  easily  within  reach  of  being  sufficient  for  a  practical  and  usable  system. 

The  problem  of  connecting  formal  models  of  compiler  translations  to  their  actual  implemen¬ 
tations  remains  a  difficult  one.  As  the  formal  models  scale  up  to  more  closely  model  the  actual 
languages  and  transformations  implemented  in  the  compiler,  the  syntactic  overhead  and  complexity 
of  proofs  about  the  model  increases  significantly  to  the  point  that  confidence  in  the  correctness  of 
the  proofs  must  inevitably  fall.  New  techniques  for  dealing  with  this  complexity  are  needed,  and 
while  this  dissertation  attempts  to  address  this  to  a  certain  extent,  in  general  it  remains  an  open 
problem.  One  promising  approach  to  dealing  with  this  lies  in  the  use  of  logical  frameworks  to 
mechanically  check  proofs,  or  even  to  assist  in  generating  them. 

Certifying  compilation  is  the  natural  extension  of  typed  compilation.  In  this  role,  a  certifying 
compiler  greatly  increases  the  ability  of  the  compiler  writer  to  write  a  correct  compiler  by  extending 
the  benefits  of  type  checking  to  a  lower  level.  As  with  all  disciplines,  the  discipline  of  having  always 
to  consider  the  type  correctness  of  compiler  transformations  is  sometimes  burdensome.  I  believe 
that  it  is  also  a  valuable  one.  It  should  always  be  clear  to  the  compiler  writer  why  a  particular 
transformation  is  safe.  The  type  checker  in  a  type  preserving  compiler  enforces  this  discipline. 
It  is  important  that  the  flexibility  of  type  systems  for  compiler  internal  languages  continue  to  be 
improved  upon,  so  that  the  cases  where  the  type  checker  must  reject  safe  code  due  to  its  own 
limitations  can  be  made  increasingly  rare. 

Most  importantly,  certifying  compilation  provides  an  important  tool  for  coming  to  grips  with 
the  increasing  problem  of  providing  security  in  a  wide-open,  networked  universe.  Delivering  security 
along  with  downloaded  code  is  an  essential  part  of  providing  value  to  the  end  user.  This  thesis 
demonstrates  that  automatically  generating  such  security  in  the  form  of  certified  code  is  well  within 
our  grasp. 
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Appendix  A 

MIL  static  semantics 


Well-formed  Kind 


h  K  ok 


h  T32  ok 


h  ok  h  K2  ok  h  ki  ok  h  K2  ok 


\-  Ki  ^  K2  ok 


Kl  X  K2  ok 


Well-formed  type  context 


h  A  ok 


H  •  ok 


h  A  ok  A\-  K  ok 


h  A,  a:K  ok 


a  ^  A 


Well-formed  Constructor 


A\-  c:  K 


h  A,  a:K,  A'  ok  h  A  ok  h  A  ok  h  A  ok 

A,a:K,A'\-a:K  A  hint  1X32  AI-Boxf:T32  A  h  Farray :  T32 


h  A  ok 


A  h  c :  T 


32 


A  h  c:T32 


AI-Exn:T32  A  h  Array^  :  T32  A  h  Dyntag^  :  T32 


AI-ci:T32  AI-C2:T32  A,  a:T32, /3:T32  h  ci :  T32  A,  01X32, /3:T32  h  C2  :  T32 


A  h  Vararg^_^  •X32 


A  h  fi{a,  (3). (01,02) :  X32  x  X32 


a,f3  ^  A 


A  h  Ci :  X32  i  G  0  . . .  j 
A  h  c:  X32 


A  h  (co  . . .  Cj)  ^  c :  X 


32 


AI-Ci:X32  iG0...j 
A  h  Cl  X  . . .  X  Co  :  X32 
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AI-Cj:T32  iGO...j 
AI-c:T32  m<n  +  j 


AI-Ci:T32  iGO...j 
A  h  c:T32 


A  I-  Sum„(co  . . .  Cj)  :  T32  AH  Sum™(co  . . .  cj) :  T32 


H  Ki  ok  A,  a:Ki  H  c :  ^2  A  H  ci :  ^  K2  A  H  C2  : 

_ a  ^  A  _ 

A  \- X(a  ::  k).c:  Ki  ^  K2  A  H  ci  C2  :  ^2 


A  H  Cl :  A  H  C2  :  K2 


A  H  (ci,C2)  :ki  X  K2 


Ah  c:  Ki  X  K2 

- (ZG{1,2}) 

Ah  TTic:  Ki 


Base  constructors  □ 

The  base  constructors  consist  of  Int,  Boxf,  arrows,  records,  projections  from  mu  types,  Vararg, 
Sum,  Array,  Farray,  Exn,  and  Dyntag.  The  non-record  base  constructors  consist  of  any  of  the  above 
except  record  types. 


Constructor  Equivalence 


A  H  c  =  c' :  K 


A  H  c : 

A  H  c'  =  c :  K 

A  H  Cl  =  C2  :  K  A  H  C2  =  C3  :  K 

A  H  c  =  c :  K 

A  H  c  =  c' :  K 

A  H  Cl  =  C3  :  /t 

Ah  C  \  Kl  - 

K2  a  ^  FV(c) 

A  H  c :  Kl  X  K2 

A  H  X{a:Ki).ca  =  c:  ki  ^  K2 

A  H  (tti  c,  vr2  c)  =  c  :  Kl  X  K2 

A  H  Cl 

■.Kl  A  H  C2  :  K2 

A  H  Cl :  Kl  A  H  C2  :  K2 

A  H  7ri(ci,C2)  =  Cl  :ki 

A  H  7r2(ci,C2)  =  C2-.K2 

H  Kl  ok 

A,  a:Ki  h  Cl'.  K2 

A  H  C2  :  Kl 

A  H  {X{a:Ki).ci)c2 

=  ci[c2/a]  :  K2 

H  Kl  ok 

A  H  Cl  =  c'^ :  Kl  ^  K2 

A,  a:Ki  H  Cl  =  C2  :  K2 

A  H  C2  =  C2  :  Kl 

A  H  A(a:Ki).ci  =  X{a:Ki).C2  :  ki  ^  K2  AH  C1C2  =  c'^C2  :  ^2 
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A  h  Cl  ok  A  h  C2  ok  A  h  ci  =  x  [c'^, . . . ,  c^]  :  T32  n  <  flattenlimit 


A  h  Vararg^^^^^  =  {c[, . . . ,  c'J  C2  :  T32 

A  h  Cl  ok  A  h  C2  ok  A  h  ci  =  x  [c^^, . . . ,  c^]  :  T32  n  >  flattenlimit 
A  I-  Vararg^^^^2  =  ci  ^  C2  :  T32 


A  h  Cl  ok  A  h  C2  ok  A  h  ci  =  c'i:T32 
where  c'l  is  a  non-record  base  constructor 

A  h  Vararg^^^^2  =  ci  ^  C2  :  T32 


A  h  c  =  c' :  Ki  X  K2 


A  h  c  =  c' :  Ki  X  K2 


A  h  TTi  C  =  VTi  c'  :  Kl 


A  h  712  c  =  7r2  c' :  K2 


Type  equivalence 


A  h  r  =  r' :  T32 


AI-c  =  c':T32  AI-ri  =  r':T32  fGl...n 


A  h  r(c)  =  r(c') :  T32  A  h  n  X  . . .  X  Tn  =  X  . . .  X  :  T32 


A  h  Ti  =  Tj' :  T32  7  G  1 . . .  n 

A  h  r  =  r' :  T32 


A  h  y[a::Ki, . .  .,a::Kn]{Ti, . . .  ,Tm){k)  ^t  =  V[a::Ki, . . .  ,a::Kn](r{, . . .  ,r^)(/c)  ^  r' :  T32 


Well-formed  float  value 


A;  r  h  /c  :  Float 


h  A  ok  A  h  r  ok 

_ r 

A;  r  h  r  :  Float 


h  A  ok  Ah  r[x/-]  ok 

_ fvar 

A;r[xj]h  Xf  : Float 


203 


Well-formed  small  value 


A;  r  h  St! :  r 


A  h  r[a;:r]  ok 

_ var 

A;  r[a;:r]  h  x  :  r 

A  h  r  ok 

_ int 

A;  r  h  i :  Int 


A;r  h  sv:ci[Triij.{a,P){ci,C2)/a,Tr2i2{a,P){ci,C2)/P] 

_ roll 

A;r  H  roll^.^(a^^)(ci,c2)  sv  :  vr^  ^(a, /3)(ci,  C2) 

A  h  c  =  7ri/i(Q!,/3)(ci,C2)  :T32  A;  T  h  s?;  :  vr*  ^(a, /3)(ci,  C2) 
_ unroll 

A;r  h  unrollc  st :  Cj[vri //(a, /3)(ci,  C2)/a,  vr2 //(a, /3)(ci,  02)//?] 


A  h  Suin^  (^  :T32 

_ in j -tag 

A;r  h  inj_tag^(^-^ :  Suini(^ 

Well  formed  32  bit  instructions 


A;  r  h  SI! :  r 

_ sv 

A;  r  h  St! :  r 


A; r  h  opr : r 


A;  r  h  /?; :  Float 

_ boxf 

A; r  h  boxf  fv  : Boxf 


A;  r  h  svi :  n 

_ tuple 

A;r  h  {svi, .  .  .,SVn)  :tiX  ...  X  Tn 

A;  r  h  St! :  Cl  — >  C2 

_  vararg 

A;  r  h  vararg^^^^2  ■  Vararg^^^^^ 
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A;  r  h  St! :  Vararg, 


onearg 


A;  r  h  onearg^^^^^  sv.ci^  C2 

A;r  h  sv:y[ai::Ki, . . . , an-^nKri, . .  .,Tm){k)  r 
A;rh  SVi'.Ti  [ci/  CVl,  ...  ,  Cnj  Ckn] 

A;  r  h  fvj^ :  Float 

_ app 

A;r  h  s?;[ci, . . .  ,Cn](sti, . .  .,svm){fvi, . . .  ,/tfc)  :r[ci/ai, . . .  ,Cn/an] 

A  h  Sum|(^  :  T32 

A;  r  h  St! :  Sum|(^ 

_ proj 

A  h  Sum^  (^  :  T32  A;  F  h  s?; :  Cj 
_ inj 

Suini(c) 

A;  r  h  SI! :  Ti  X  . . .  X  Tn 

_ select 

A;  r  h  select*  sv  :  Ti 

A  h  t:T32 

A;  r  h  SI! :  Sumj(c)  A;  r[xj  :  Sum^  (^]  h  e,  :  r 
_ case 

A;  r  h  caseT-(s?;)  (xi.ei, . . . ,  x^.e^) :  r 
A;  r  h  sx  :  ExnA  h  r  :  T32 

A;  r  h  sxi  :  Dyntag  A;  r[xi:ci]  h  ei :  r  A;ri-e2:T 
_ t _ exncase 

A;  r  h  exncaseT-(sx)  (sxi  ^  xi.ei,  -  ^  62) :  r 

A;  r  h  Cl :  r  A;  r[x  :  Exn]  h  62  :  r 

_ handle 

A;  r  h  handleT-(ei,  x.e2) :  r 

A;rh  sxi :  Dyntag^  A;  F  h  SV2  '.  c 

_ inj -exn 

A;  F  H  inj_dyii^(sxi,  SV2)  ■  Exn 
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A  h  c:  T32 

A;rh  sill :  Array^  A;  F  h  SV2  ■  Int 

_ sub 

A;  r  h  subc(si’i,  SV2)  ■  c 


A;  r  h  sill :  Int  A;  F  h  SV2  ■  c 

_  array 

A;F  h  array^(s?;i,  s?;2)  :Array^ 


A;  F  h  SI! :  Int  Ah  fv  :  Float 

_  f array 

A;F  h  f  array (sv,fv)  :Farray 


A  h  r  :  T32  A;  F  h  SI! :  Exn 

_  raise 

A;  F  h  raise,-  sv  :  r 


A  h  c :  T32  A  h  F  ok 

_ mkexntag 

A;  F  h  mkexntagj,  :  Dyntag^ 


Well  formed  float  instructions 


A;  F  h  opr  :  Float 


A;  F  h  sr  :  Boxf 


A;  F  h  unboxf  sv  :  Float 


unbox 


A;  F  h  sill :  Farray  A;  F  h  SV2  ■  Int 

_ fsub 

A;  F  h  f  sub(s?;i,  SV2)  ■  Float 


A]T  h  fv  :  Float 

- fv 

A]T  h  fv  :  Float 
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Well-formed  Expression 


A;  r  h  SI! :  r 

_ sv 

A-,T  \-  sv  :  T 

A;  r  h  i :  r'  A;  r[x  :  r']  h  e  :  r 

_ i 

A;  r  h  letr  X  =  i  in  e  :  r 


A; r  hi: Float  A;r[xj]he:T 

_ i64 

A;  r  h  let,-  X/  =  i  in  e  :  r 


A;r[/  :  y[ay.K]{T){\xf\)  — >  Tr,  ailK,  xTt,  Xf]  \-  ef.Tr 
A;r[/  :  V[a:':«:](f)(|x/|)  ^  r,.]  h  e  :  r 

A;  r  h  let,-  recT-^  /[a:'!K](xTV)(xj).ej  in  e  :  V[a:'!/t](r)(|xj|)  — >  r 


Well-formed  Context 


h  A  ok  A  h  r  ok  A  h  r  :  T32  A  h  T  ok 

_  _ X  ^  r  _ X64  ^  r 

A  h  •  ok  A  h  r,  x:r  ok  A  h  F,  XQi'.cp  ok 
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Appendix  B 

LIL  static  semantics 


Definitions 


Well-formed  Kind 

h  A  ok 
A  h  T32  ok 

A  h  Ki  ok  A  h  k;2  ok 
A  h  Ki  ^  k;2  ok 


_  .  ^  uci  •  1  , 

K  list  =  fij.l  K  X  J 

def  .  . 

nat  =  ^j.l  +  j 

0  fold,atinj}+“^% 

n  +  1  f  oldjiat  inj2’^“^^(n) 

h  A  ok  h  A  ok 

A  h  T64  ok  A  h  1  ok 

A  h  Ki  ok  A  h  ^2  ok 
A  h  X  /i2  ok 


A  h  K  ok 

h  A,j,  A'  ok 
A,  j,  A'  h  j  ok 

A  h  Ki  ok  A  h  «;2  ok 
A  h  Ki  +  K2  ok 


A,j  h  K  ok  A,j  h  K  ok 

_ {j  ^  A,j  only  positive  in  k)  _ (j  ^  A) 

A  h  ^j.K  ok  A  h  yj.K  ok 

Well-formed  type  and  kind  ontext 


h  A  ok 


H  •  ok 


h  A  ok  h  A  ok  A  h  K  ok 

_ j  ^  A  _ a  ^  A 

h  A,j  ok  h  A,  a:K  ok 
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Well-formed  Constructor 


Ah  c:  K 


Constants  and  their  kinds 
Float:Te4  Int:T32  Void:T32 
Array32:T32  ^  T32  Array g4:T64  ^  T32  Boxed  :T64  ^  T32 

Tag  :nat  ^  T32  Dyntag  1X32  ^  T32  Dyn  1X32 
x:X32list  ^  X32  ^  :X32list  ^  X64list  ^  X32  ^  X32 

\/  :X32list  ^  X32  Code  :X32list  ^  X64list  ^  X32  ^  X32 
V:Vi.(i  ^  X32)  ^  X32  3:Vj.(j  ^  X32)  ^  X32 

Rec  :Vj.((j  ^  X32)  ^  (j  ^  X32))  X32 


h  A  ok  h  A,  a:K,  A'  ok 
Ah*:!  A,  a:K,  A'  h  a:  k 

A  h  Ki  ok  A,  a:Ki  h  c:  K2  A  h  ci :  ^  K2  A  h  C2  :  ki 

_ a  ^  A  _ 

A  h  A(a  ::  k).c:  ki  K2 

A  h  Cl :  A  h  C2  :  K2 

(ci,C2)  h  Ki  X  /t2  : 


Ah  c:  Ki  (i  G  {1 . . .  n}) 

A  I-  c :  +[Ki,...,Kn] 


A  h  fij.K  ok 

A  I-  fold^y^  c:/uj.K 

A  h  ^j.K  ok  A  h  k'  ok  j,  a,  p,^  A 
A,j,a:K,p:{j  K'),h  c:k' 

A  h  pr(j,  a-.K,  p:{j  k'),  in  c)  :  pj.K  k' 

Ah  c.yj.K'  A  h  K  ok  A^jhc'.K 

-  - 3  ^  ^ 

A  h  c[k]  :  k'[k/ j]  A  h  Aj.c :  Vj.«; 


A  h  Cl  C2  :  ^2 


A  h  c :  Ki  X  K2 

- (ZG{1,2}) 

Ah  'KiC:  Ki 

A  h  c:  +  [ki,  ...,Kn] 

A^apKih  Ci\  K  iGl...n  ,  ^ 

- aif  A 

A  h  case  c[(ai.ci, . . . ,  On-Cn)]  :  « 


A  h  c:K[pj.K/j] 
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Constructor  Equivalence 


A  h  c  =  c' :  K 


Ahci/i  A(-c'  =  c:k  AI-ci  =  C2:k  AI-C2  =  C3:«; 


AI-c=c:k  AI-c=c':k  AI-ci  =  C3:«; 


A  h  c :  Ki  ^  K2  a  ^  FV (c)  A  h  c :  x  K2 


A  h  X{a:Ki).ca  =  c:  ki^  K2  Ah  (tti  c,  7r2  c)  =  c  :  x  K2 


A  h  c:  +  [ki, 

A  h  case(c,  [oi.  ai, . . .  ,an-  «„])  =  c:  +  [ki,  . . . , 


A  h  Cl :  A  h  C2  :  K2 


A  h  7ri(ci,C2)  =  Cl  :ki 
A  h  Ki  ok 

A,  a:Ki  h  Cl :  ^2  A  h  C2  :  ki 
A  h  (A(a:Ki).ci)c2  =  ci[c2/a]  :  K2 


A  h  Cl :  Ki  A  h  C2  :  /i2 


A  h  7r2(ci,C2)  =  C2:K2 
A  h  K  ok  A,  j  h  c: 


A  h  (Aj.c)[/t]  =  c[K/j]  :  K'[K/j] 


Ah  c:  Ki  A  h  Kj  ok  j  G  1 ..  .n 
A  , Oj-.Kj  h  Cj  :  K  j  G  1 ..  .n 

A  h  case(inj^^'^^’'"’'""'  c,  [. . . ,  Ui-Ci,  ...])  =  Ci[c/ai]  :  k 

A  h  c' :  K[ijj.K/j]  A  h  k'  ok  (j,  a,  p,  ^  A) 

A,  j,  a:K,  yO:(j  ^  k'),  h  c :  /t'  Ah  pj.K  ok 

A  h  pr{j,a:K,p:{j  k'),  inc)  fold^j.^^c' 

=  c[pj.K,c',pT{j,a:K,p:{j  ^  k')  ±iic),/j,a,p] 

:  pj.K  k' 


A  h  Ki  ok  A  h  Cl  =  c'^ :  Ki  ^  K2 

A,  a:Ki  h  Cl  =  C2  :  K2  A  h  C2  =  C2  :  ki 

A  h  A(a:Ki).ci  =  \{a:Ki).C2  :  ni  ^  K2  Ah  C1C2  =  c'^C2  :  ^2 

A  h  c  =  c' :  Ki  X  K2  a  h  c  =  c'  :  ki  X  K2 

A  h  TTi  C  =  TTi  ch  Ki  A  h  7r2  C  =  7r2  ch  ^2 
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A  h  c  =  c' :  Kj  A  h  ok  j  £  1 ..  .n 

A  h  c  =  c' :  +  [k.1,  ...  ,Kn] 

A,  ai'.Ki  \-  Ci  =  c[:  K  i  G  1 . .  .n 

A  h  case(c,  [oi.ci, . . . ,  On-Cn])  =  case(c',  [ai.c'i, . . .  ,an-c^]) :  n 

A\-  c  =  c' :  K[fj,j.K/j] 

A  I-  f  old^j.K  c  =  f  old^j.K  c' :  ^ij.K 

j •>  ^5  P}  ^  ^ 

A\-fj,j.Kok  A  h  ok  A,  j,  a:K,  p:{j  ^  k'),\- c  =  c' :  k' 

A  h  pr(j,  a-.K,  p:j  k',  inc) 

=  pr(j,  a-.K,  p:j  k' ,  in  c') :  pj.K  k' 


A,  j  \-  c  =  c' :  k'  a  h  c  =  c'  :  Vj.k'  A  h  k  ok 

- j  ^  ^  - 

A  h  Aj.c  =  Aj.c' :  yj.K'  A  h  c[k\  =  c![k]  :  k'[k/ j\ 

Well-formed  term  context 


h  A  ok  A  h  r  ok  A  h  r  :  T32  A  h  T  ok  A  h  (/> :  T64 

_  _ x^T  _ 

Ah*  ok  A  h  r,  x'.T  ok  A  h  r,  XQi'.cj)  ok 

Well-formed  heap  context 


H  •  ok 


h  'k  ok  •Hr:  T32 

_ f 

H  'k,  t.T  ok 


Well-formed  64  bit  value 


A  Hr  ok  H4'ok  A  H  T,  X64:(/>,  T' ok  H  4' ok 
'k;  A;  r  H  r :  Float  'k;  A;  T,  XQ^-.cp,  T'  H  X64  :  (j) 


A  Hr  ok 


X64  i  r 


H  'k  ok 


A;r  H/r:</) 
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Well-formed  32  bit  Value 


A;  r  h  St! :  r 


A  h  r,  x:t,  r'  ok  h  'k  ok  A  h  F  ok  h  'k  ok  A  h  F  ok  h  'ki,  f:r,  '^2  ok 


'k;  A;  F,  a;:r,  F' h  X  :  T  'k;  A;  F  h  i :  Int  'ki,  f:r, 'k2;  A;  F  h  f :  r 

A  h  r  =  Rec[K](c)(cp) :  T32  A  h  r  =  Rec[K](c)(cp) :  T32 

'k;  A;  F  h  sx  :  c(Rec[K]c)cp  'k;  A;  F  h  sx  :  r 

'k;  A;  F  h  toIIt  sv  '.t  'k;  A; F  h  unroll^  sv  :  c(Rec[K]c)cp 


'k;  A;  F  h  sill :  Dyntagr  'k;  A;  F  h  s?;2  :  r 

A;F  h  inj_dyn^(s?;i,  s?;2)  :  Dyn 

A  h  r  =  3[k](c')  :  T32  A\-c:k 
'k;  A;  F  h  sx  :  c'cj 

'k;  A;  F  h  pack  sv  as  r  hiding  c :  r 


A  h  c  =  V[-  •  •  •  •  •]  :  T32  'k;  A;F  h  sx  :  a 

'k;  A;  F  h  inj  _union^  sv  :  c 

'k;  A;  F  h  sx  :  V[k](c')  AhciK 
'k;  A;  F  h  s?;[c]  :  c'c 

h  'k  ok 


A  h  F  ok 
A;F  h  tag.  :  Tag(i) 


Well  formed  64  bit  operations 


4';A;F  h  f opr:  cj)  oprg^ 


\k;  A;  F  h /y  :  'k;  A;  F  h  sx  :  Boxed  (/) 


'k;  A;  F  h  /y  :  (/>  oprg^  'k;  A;  F  h  unbox  sv  :  cj)  oprg^ 


'k;  A;  F  h  sxi :  Arrayg4i;A 
'k;  A;  F  h  sv2  '■  Int 

4';  A;  F  h  s\ib^{svi,  SV2)  :  (j)  oprg^ 
Well-formed  32  bit  operation 


'k;  A;  F  h  opr  :  r  oprgj 


'k;  A;  F  h  St! :  r 


'k;  A;  F  h  St! :  r  oprgj 


'k;  A;  F  h  sx  :  x  (tq::  . . .  ::Ti::c') 
'k;  A;  F  h  select*  sv  :  Ti  opr32 


^■A-T^fv:4) 

'k;  A;  F  h  boxfv  :  Boxed  (j) 


'k;  A;  F  h  sxj :  Tj  i  G  0, . . . ,  n 
4';  A;  F  h  (s?;o,  •  •  • ,  svn)  :  x  [tq,  . . . ,  r^] 
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4';A;r  I-  s?;:  \/[to,  . . .  ,Tk]  A;T,Xi:Ti  h  :  r  exp 
A  h  Tj  =  Tag(i)  :T32  i  G  0  . . .  (j  -  1) 

A  h  Tj  =  x[Tag(i),T-]  :T32  i£j...k 

A;  r  h  case(s?;)(x.eo,  •  •  • ,  x.Ck)  :  r  oprgj 


A  h  t:T32 

A;r  h  dyntag^  :  Dyntagr  opr 33 

A;  r  h  SI! :  Dyn  'I';  A;  F  h  s?;i :  Dyntagri 
'F;  A;  F,  xi'.ti  F  ei :  r  exp  'F;  A;  F  h  e  :  r  exp 

'F;  A;  F  h  dyncase(s?;)(s?;i  ^  xi.ei,  _  ^  e)  :  r  opr32 


'F;  A;  F  h  St! :  Dyn  A  h  r  :  T32 
'F;  A;  F  h  raises  sv  :  r  oprgj 


'F;  A;  F  h  ei :  r  exp 
^';A;F,x:  Dyn  h  62  :  T  exp 

'F;  A;  F  F  handleT-(ei,  x.e2)  :  r  oprgj 


X  ^  F 


A;F  F  sx  :  ^  ([tq,  . . .  ,Tn]){[(j)o,  ■  ■  •  ,</>fc])(T) 
'F;  A;F  F  sxi  :Ti  'F;  A;  F  F /xj :  </>* 


A;F  F  sv{svo, . . . ,  sVn){fvo, . .  .JVf,):T  oprgg 


A;F  F  sx  :  Code[ro, . . .  ,Tn][4>o,  ■  ■  ■  Ak]{T) 
A;F  F  sxj  :ri  A;  F  F /xj :  </>» 


A;F  F  call  s?;(s?;o,  •  •  .,sVn){fvQ, . . .  JVk):T  oprgg 


'F;  A;  F  F  sxi :  Int  'F;  A;  F  F  sv2  ■  t 


^';A;FF  array^(s?;i,  5x2) :  Array32(r)  oprgg 

'F;  A;  F  F  sxi :  Array32(r)  'F;  A;  F  F  SV2  ■  Int 
'F;  A;  F  F  subT-(s?;i,  SV2)  ■  t  oprgj 

'F;  A;  F  F  sxi :  Array32(r)  'F;  A;  F  F  5x2  :  Int  'F;  A;  F  F  5x3  :  r 
'F;  A;  F  F  upd^(s?;i,  SV2,  SV3) :  Unit  optgj 
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'I';  A;  r  h  St! :  Int  'I';  A;  F  h /y  :  (/> 


A;r  h  array^(s?;,/?;) :  Arrayg4(/>  oprgj 


'F;  A;  r  h  sill :  Array g4(/>  'F;  A;  F  h  S7;2  :  Int  'F;  A;  F  h /y  : 

A;F  h  upd^(s?;i,  s?;2,/y) :  Unit  oprg^ 


Well-formed  Expression 


'F;  A;  F  h  e  :  r  exp 


A  h  Kj  ok  A,  a:K  h  Tj :  T32  A,  a:/?  h  (/>* :  T64 
'F;  A,  d:K;  F,  /:V[a:K](r)((?!>)  ^  r,  x:r,  X64:(/>  h  e  :  r  exp 
'F;  A;  F,  /:V[a:K](T)(0)  t  \-  e'  :t'  exp 

'F;  A;  F  h  let  rec^  /[d:K](x:r)(x64:(/>).e  ine' :  t'  exp 


d,x,xii,f  ^  A,F 


'F;  A;  F  h  sx  :  r 
'F;  A;  F  h  sx  :  r  exp 

'F;  A;  F  h  opr  :  r  oprgj  'F;  A;  F  h  opr  :  (j)  oprgj 

'F;  A;  F,  x:t  F  e  :  r'  exp  'F;  A;  F,  XQ/^-.ip  heir'  exp 

'F;  A;  F  h  let  x  =  opr  ine  :  r'  exp  'F;  A;  F  h  let  X64  =  fopr  ine  :  r'  exp 
'F;  A;  F  h  sr  :  3[k](c) 

'F;  A,  a:K;  F,  x:(ca)  h  e  :  r' exp  a  ^  fvij') 
_ a,  X  ^  A,  F 

'F;  A;  F  h  let[  a,  x]  =  unpack  sx  in  e  :  t'  exp 

A,/3:ki,7:k2,  A';F[(/3,7)/a]  h  e[(/3,7)/a]  :T[(/3,7)/a]  exp 
A,  a:Ki  X  K2,  A'  h  c  =  a  :  ki  x  K2 

- /3, 7  ^  A 

'F;  A,  aiKi  x  K2,  A';  F  h  let(/3, 7)  =  c ine  :  r  exp 

A;F  h  e[ci,  02//?,  7]  :r  exp 
A  h  C  =  (C1,C2)  :  Kl  X  K2  „  ,  . 

- p  ^  A 

A;  r  h  let(/3, 7)  =  cine:  r  exp 

A,  /3:Kl/uj.K/j],  A';rlfoldf^j,K  /^/a]  h  e[f  old^y^ /3/a]  :  r[f  old^y^ /3/a]  exp 
A,  a'.jij.K,  A'  h  c  =  a  :  p,j.K 

'F;  A,  a:p.j.K,  A';  F  h  let(f  old /3)  =  c  ine  :  r  exp 
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'I';  A;  r  h  e[c'//3]  :  r  exp 
A  h  c  =  f  c' :  iJ.j.K 


'I';  A;  r  h  let(fold (3)  =  c  ine  :  r  exp 


A,  a:Ki  +  K2)  A'  h  c  =  a  :  Ki  +  K2  ai,  02  ^  A,  A' 
A,/3:Ki,  A';r[iiijj/3/a]  h  e[inji/3/a]  :  r[inj^ /3/a]  exp 
'I';  A,  /3:Kj,  A';  r[iiijj  /3/a]  h  s?;[iiijj  /3/a]  :  Void  j  G  —  + 

A,  a:  +  [ki,  . . . ,  k^],  A'  h  let,-  inj^  /3  =  (c,  s?;)  ine  :  r  exp 


'I';  A;  r  h  e[c'//3]  :  r  exp 


'I';  A;  r  h  let,-  inj^  /3  =  (c,  s?;)  ine  :  r  exp 


Well-formed  hval 


h  hval :  T  hval 


•  h  Kj  ok  {i  £  1 ..  .k) 
ai:Ki, . . .  ,afc:Kfc  h  Ti  :T32  (iGl...m) 

ai.ACi,  •  •  •  ,  H  (pi  .  T64  ip  ^  1 . . .  n) 

ai:Ki, . .  ..ak'.Kk  h  r  1X32 

^5  *  *  *  ?  ^k'^k'!  ^1*^1  ,  .  .  .  ,  Xrri'^m')  zi:pi, . .  .,Zn-pn  h  e:r  exp 

I-  coder[ai:Ki, . . . ,  ak:Kk]{xi:Ti, . . . ,  Xm-Tm){zi:pi, . . . ,  Zn-pn) -e  : 
V[ai:Ki, . .  .,ak-.Kk]  Code(ri, . . .  •  •  ■,Pn){T)  hval 


Well-formed  heap 


'k  h  d  ok 


h  'k  ok 


'k  h  e  ok 


'k[£:r]  h  hval :  r  hval 
'k[f:r]  h  d  ok 

'k[£:r]  h  d,i\T  ^  hval  ok 


Well-formed  program 


\-  p:T 


'k(e) 

'k((i,  t.T 


del 

=  • 

hval)  'k(d), f:r 
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'I', h  d  ok  'k, •  h  e  :  T  exp 
'k  H  letrec  d  in  e  :  r 
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Appendix  C 

TILTAL  static  and  dynamic  semantics 


C.l  TILTAL  Static  semantics 

Notation 


(t  032  O-)[0]32  =T 

(r  >32  a)[n  +  1)32  cr[n] 32 

{(j)  >64  CF)[n  +  2)32  cr[n]32 


((/)>64  O-)[0]64  =  </> 

(r  >32  cr)[n  +  1]64  0-[n]64 

((/>  >64  <y)[n  +  2]64  o-[n]64 


{t  >32  cr)[0]32  ^  t' 
((/)>64  cr)[0]32  ^  t' 
((/>>64  cr)[l]32  ^  t' 

{t  >32  cr)[n+  1]32  ^  t' 

((/)>64  cr)[n  +  2)32  ^  t' 


def 

def 

def 

def 

def 


t'  >32  a 

t'  >32  nS32  >32  d 
ns  32  >32  t'  >32  d 
(o-)N32  ^  t' 
(cr)[n] 32  ^  t' 


□ 


(n  >32  T2  >32  O-)[0]64 

^k 

def  1/ 

=  (p  >64  d 

(n  >32  0>64  O-)[0]64  • 

^k 

def  ,1 

=  (j)'  >64  nS32  >32  d 

((/)>64  O-)[0]64  ^  k 

def  1/ 

=  9  >64  d 

((/>>64  0-)[l]64  ^  k 

'==  nS32  >32  {nS32  >32  cr)[0] 

k  >32  d)[n+  1]64  ^ 

k 

'==  id)[n]64  ^  k 

((/)>64  fT)[n  +  2]64  ^ 

k 

(cr)N64  ^  k 

|e| 

|r  >32  d 
|(/>>64  d 


1  +  kl 

2  +  kl 


<!>' 
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New  well-formed  kind  rules 


A  h  K  ok 


A  h  5r  ok 


Well-formed  Constructor 


Ah  c:  K 


Constants  and  their  kinds 

e:ST  >32  :T32  ST  ^  ST  >64  :T64  ST  ^  ST 
o:ST^ST^ST  sptr:5T^T32 
ns64:T64  ns32:T32  V  :T32list  ^  T32 


A  h  r  ok 
A  h  r  ^  0  :  T32 

Well-formed  coercion 

A  h  r  =  rec[K](c)(cp)  :T32 
A  h  toIIt  :  (c(Rec[«;]c)cp)  ^  r 

A  h  r  =  rec[K](c)(cp)  :T32 
A  |-  unroll,-  w  :  (c(Rec[K]c)cp)  ^  r 

A  h  c  =  y (tq  Ti  nil)  :  T32  list 

A  h  inj.unionj-j :Ti^  c 

A  h  T  =  3[k](c')  :  T32  Ah  c:k 
A  h  pack[r]c :  (c'  c)  ^  r 


Ah  q-.T  ^  t' 


A  h  t:T32 


A  H  inj.dyn^  :  (x[Dyntag(r),r])  ^  Dyn 


Well-formed  stack 


'k;  A  h  s:a 


h  A  ok 


A  h  e:e 


'k;  A  h  re  :  r 
'k;  A  h  s:a 


'k ;  A  h  re  >32  s  :  r  >32  cr 


^';Ah  /  :(/> 

'k;  A  h  s  :  fj 


'k ;  A  h  /  >32  s  :  (/>  >64  (7 


220 


Well-formed  64-bit  value 


h  4'  ok  h  A  ok 


'k;  A  h  nsQ4  :  ns64 


h  'k  ok  h  A  ok  h  'k,  'k'  ok  HA  ok 

'k;AHr:Float  'k,  'k';  A  H  f :  i;A 


Well-formed  32-bit  value 


'k;  A  H  It; :  r 


H  'k,  £:t,  'k'  ok  HA  ok 
4^,  f:r4'';  A  H  f :  r 


H  4^  ok  H  A  ok 


4^;  A  H  i :  Int 


H  4^  ok  H  A  ok 


4^;  A  H  ns32  :  ns32 


H  4^  ok  H  A  ok  4^;  A  H  tr  :  V[k](c^)  AHc:k 


4';  A  H  tagi  : 

Tag(i) 

4^;  A  H  w[c]  :  c'c 

A  H  cj  =  ct'  :  5T 

(|cj'|  =  i) 

Ah  q-.Ti  ^  T2  4^ ;  a  H  It; :  Ti 

4^;  A  H  sptr(z) : 

sptr(cj) 

A  h  qw:  T2 

Well-formed  64-bit  operand 

4/;A;rH/?;:0 

II 

^;Ahl:^ 

r(sp)  =  CJ  Cj[t]64  =  4> 

vk;A;rH/:</> 

'If;  A;r  h  l-.cj) 

4^;  A;  r  H  sp(t) :  4> 

Well-formed  32-bit  operand 

4^;  A;  r  H  si; :  r 

t- 

II 

4';  A  H  tt; :  r 

r(sp)=C7  cj[t]32  =  r 

4';  A;  r  H  r  :  r 

4';  A;  r  H  tt; :  r 

4';  A;  r  H  sp(t) :  r 

^;A;r  H  s?;:V[k](c') 

AHc:tt  AHg:ri^T2  4';A;rHs?;:ri 

4';  A;  r  H  s?;[c] 

:  c'c 

4^;  A;  r  H  g  s?; :  r2 
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Well-formed  instruction 


^;A;rhi^r 


A;  r  h  St! :  r 

A;  r  h  movr,  sv  ^  r{r:r} 

A;  r  h  r^  :  x  [tq,  . . .  ,Ti, . . . ,  Tn] 
4';  A;r  H  loadr  r^,  r5(i)  ^  rlr^ir*} 


'I';  A;  r  h  St! :  Tj 

A;  r  h  r^  :  x  [tq,  . . . ,  r*, . . . ,  Tn] 

A;  r  h  store  r5(i),  sr  ^  F 

'F;  A;r  h  srjiTj 

A;r  h  mallocr[ri, . . .  ,rn](sti, . .  .,sVn)  Fjr:  x  [n, . .  .,Tn]} 
'F;  A;  r  h  svi :  Int  'F;  A;  F  h  SV2  ■  t 
'F;  A;  F  h  malloc,-  r(s?;i,  SV2)  ^  F{r:Array32(r)} 

'F;  A;  F  h  sr  :  Int  'F;  A;  F  h /r  : 

A;F  h  malloc^  r(s?;,/t)  ^  F{r:Arrayg4(0)} 
^;A;Fh/r:</. 

'F;  A;  F  h  malloc^  r,fv  ^  F{r:  Boxed((/>)} 

'F;  A;F  h  sr  :  F{sp:(Fret  ^  0)  >32  a}  ^  0  (F(sp)  =  a) 

'F;  A;  F  h  call  sv  ^  Fret 

'F;  A;  F  h  sv  :  F{r:  Tag(z)}  ^  0 
A  h  F(r)  =  V('ro  Tk-i  ::  Tag(z)  ::  c) :  T32 

A;F  I-  brtag^r,  sr  ^  F{r:  V(to  Tk-i  ::  c)} 

'F;  A;  F  h  sr  :  F{r:  x  [Tag(i),  r]}  — >  0 
A  h  F(r)  =  V('ro  Tk-i  ::  •  -  Tn  ::  (x  [Tag(z), r])  ::  c)  1X32 

A;F  h  brtgd^  r,  sv  ^  F{r:  \/{to  Tn  ::  c)} 
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T{wreg)  =  Dyn 

'I';  A;  r  h  sill :  Dyntag(r)  'I';  A;  F  h  SV‘2  :  r{r:  x  [Dyntag(r),  r]}  — >  0 
'F;  A;  r  H  brdynr,  s?;i,  SV2  ^  F 

h  'F  ok  A  h  r  :  T32 
'F;  A;  F  h  dyntag^  r  F{r:  Dyntag(T)} 

'F;  A;  F  h  SI! :  r  F(sp)  =  ct 
A  \-  a  =  a' :  ST  (ex')  [i]  32  ^  t  =  a" 

'F;  A;  F  h  swrite  sp(i),  sv  ^  F{sp:fT"} 
h  ok  A  h  F  ok 

'F;  A;F  h  sallocn  ^  F{sp:  7x532  >32  -y  >32  xx-S32  l>32r(sp)} 

n 

F(sp)  =  a  A  h  a  =  ai  o  a2  '■  ST  |cji|  =  n 
'F;  A;  F  F  sf  ree  n  ^  F{sp:iT2} 

F(sp)  =  fj  A  h  fj  =  fJi  o  (0  i>g4  CJ2)  :  S'T  |cji|  =  (7x  — 1) 

'F;  A;  F  F  sfree  n  ^  F{sp:rxs32  >32  CX2} 

F(sp)  =  cj 

'F;  A;  F  F  movr,  sp  ^  F{r:  sptr(cj)} 

'F;  A;  F  F  st;  :  sptr(cji) 

A  F  F(sp)  =  CJ2  o  1T2  :  ST 

'F;  A;  F  F  mov  sp,  sv  F{sp  :cx2} 

'F;  A;  F  F  stxi  :  Array32(r)  'F;  A;  F  F  SV2  ■  Int 
'F;  A;  F  F  sub,-  r,  S7;i,  SV2  ^  F{r:r} 

'F;  A;  F  F  5771 :  Array32(r)  'F;  A;  F  F  5772  :  Int  'F;  A;  F  F  5773  :  r 
'F;  A;  F  F  upd^  S77i,  sv2,  5773  F 
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^■,A-,Thfv:<P 


'I';  A;  r  h  fmov  {,fv  r{f  :>;/)} 


A;  r  h  :  Boxed((/>) 

A;r  h  floadrfd,r5  ^  Tlf^ic/)} 


^■,A-,Th  fv:<P 
A;  r  h  r  :  Boxed((p) 

'I';  A;  r  h  store  r,fv  ^  F 


'F;  A;  r  h /?; :  i;A  r(sp)=cj 
A\-  a  =  a'  ■.  ST  ((T')[i]e4  <—(/>  =  cr" 

'I';  A;  r  h  f  swrite  sp(i),fv  ^  rjsprcj''} 


'F;  A;  r  h  svi :  Arrayg4(r)  'I';  A;  F  h  SV2  :  Int 
'F;  A;  F  h  sub^  f,  sri,  SV2  ^  F{f:i;A} 


'F;  A;  F  h  sri :  Arrayg4(r)  'F;  A;  F  h  s?;2  :  Int  'F;  A;  F  h /r  :  > 


A;F  H  upd^  sn,  st2,/t  ^  F 


Well-formed  Instruction  Sequence 


F(sp)  =  (Fret  ^  0)  >32  CT 
Fret  =  F{sp:cj} 


A;  F  h  sr  :  F  ^  0 
A;F  h  jmp  sv  ok 


'F;  A;  F  h  ret  ok 
«';A;Fhrt:r  ^';A;Fhi^F'  ^';A;F'h/ok 


'F;  A;  F  h  haltr  ok 


A;F  h  r,I  ok 


A;F  h  /  :r 


'F;  A;  F  h  sr  :  3[k](c)  'F;  A,  F{r:(ca)}  h  /  ok 


'F;  A;  F  H  unpack[Q!,  r],  sr;  /  ok 


a  ^  A 


'F[inj4^''''^^  oi/o];  A,  q;i:ki,  A';  F[inj4^''''^^  oi/a]  h  oi/a]  :  Void 

'F[inj2^'''''^  Q;2/a];  A,  a2:«^2,  A';  F[inj2^'''''^  02/0]  F a2/a]  ok 
A,  a:Ki  -|-  K2-,  A';  F  h  c  =  a  :  ki  +  ^2  oi,  02  ^  A,  A' 

'F;  A,  a:Ki  +  /t2,  A';  F  h  vcase[ai.  dead  sv,  a2]c;  I  ok 
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ai/a];  A,  aiiKi,  A';  ai/a]  h ai/a]  ok 

oi/a];  A,  Q;2:/^2,  A';  r[inj2^''''^^  02/0]  1“  02/0]  :  Void 

A,  a:Ki  +  K2)  A'  h  c  =  a  :  Ki  +  K2  oi,  02  ^  A,  A' 

'k;  A,  a:Ki  +  K2,  A'  h  vcase[ai,  02-  dead  s?;]c;  I  ok 


A  h  c  =  c' ;  Ki  +  K2  A;  r  h  /[c'/os]  ok 

'k;  A;  r  h  vcase[ai.  deads?;,  a2]c;  I)  ok 

A  (-  c  =  c' :  Ki  +  K2  'k;  A;  F  h  I[c' /ai]  ok 

'k;  A;  r  h  vcase[ai,  q;2.s?;]c;  I  ok 

^[(/3,7)/o];  A,/3:«i,7:«2,A';r[(/3,7)/a]  h  I[{(5,-i)/a]  ok 
A,  a:Ki  X  K2,  A'  h  c  =  a  :  ki  x  K2 

'k;  A,  a:Ki  x  K2,  A';  T  h  ref  ine[(/3, 7)]  c;  I  ok 


A;r  h  /[ci,C2//3,7]  ok 
A  h  C  =  (ci,C2)  :  Kl  X  K2 

'k;  A;  r  h  ref  ine[(/3, 7)]  c;  I  ok 

^'[fold^j.K/3/a];  A,/3:K[|Uj.?v/j],  A';r[fold^j.«,/3/Q;]  h  I[iold^j,^P/a]  ok 

A,  a:fij.K,  A'  h  c  =  a  :  fij.K 

'k;  A,  a-.fij.K,  A';  T  h  ref  ine[f  old  j3]  c;  I  ok 


^;A;r  h  /[cV/3]  ok 
A  h  c  =  f  old^j.K  d  :  /Uj./t 

'k;  A;  r  h  ref  ine[f  old /3]  c;  I  ok 


Well-formed  hval 


'k  h  hval :  r  hval 


'k;»|-??;i:Tj  i  =  I ..  .n 
'i’d  {wi,...,Wn)-.  X  [n,  .  .  .  ,  Tn]  hval 

'k;»|-??;j:r  ?  =  l...n 

'k  h  [??;i, . . . ,  ??;„]  :  Array32r  hval 
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i  =  1 ..  .n 


h  / :  Boxed  (j)  hval  'I'  h  [/i,  :  Arrayg^^  hval 


•  I-  t:T32 


'I';  •  h  dtag :  Dyntag^  hval 


ao:Ko, .  • . ,  ai-i'.Ki-i  h  Kj  ok  d:K  h  T  ok 
'I';  c?:/?;  r  h  /  ok 

|-  codeT-[a:Av]r./ :  V[a:K]r  ^  0  hval 


Well-formed  Heap 


h  'I'  ok  h  hvali :  'l'(fj)  hval 

- . . .  ,fn:Tn} 

'k  h  {li'.Ti  ^  hvali, . . .  ,in-Tn  hvaln}  ok 


Well-formed  Register  File 


h  i?:r 


'k;  •  h  iTi :  Ti  'k ;  •  h  zr2  :  T2  'k ;  •  h  Wg  :  Te  'k ;  •  h  :  rt 

'k;  •  h :  (/>!  'k;»l-Z2:02  'k;»|-s:(T 

'k  h  {ri  1-^  iri,  r2  1-^  W2,  re  We,  r*  Wt,  fi  Zi,  f2  Z2:sp  s} 

:  {ri:ri,r2:r2,re:re,ri:rt,fi:(/)i,f2:(/>2,sp  :  a} 


Well-formed  Program 


h  {H,  R,  I)  ok 


hR:^'  ^'hR:r  ^'(R;.;rh/ok 

h  ok 

C.2  TILTAL  dynamic  semantics 

C.2.1  Definitions 


{w  >32  s)[0]32 

def 
=  W 

{1  >64  ■s)[0]64 

def 

{w  >32  s)[n-k  1]32 

def  r  , 

=  S[nj32 

(u;>32  s)[n-k  1]  64 

def 

{1  >64  s)[n  +  2]  32 

def  r  1 
=  S[nj32 

{1  >64  ■s)[?T'  +  2] 64 

def 

I 

s[n\ei 

s[n]  64 
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(u;>32  s)[0]32  ^  w 

t  def  / 

=  W  >32  S 

{1  >64  s)[0]32  ^  w' 

def  / 

=  W  >32  ns  32  >32  S 

{1  >64  ■s)[l]32  ^  w' 

def  / 

=  nS32  >32  W  >32  S 

(u;>32  s)[n+  1]32  • 

^  w'  (s)[n]32  ^  w' 

{1  >64  s)  [n  +  2)32  <- 

-  w'  (s)[n]32  ^  w' 

{wi  >32  W2  >32  s)[0]64  ^  1' 

def  ,/ 

=  t  >64  S 

{Wi  >32  1  >64  ■s)[0]64  ^ 

def  ,/ 

=  1  >64  nS32  >32  S 

{1  >64  •s)[0]64  ^  V 

def  ,, 

=  t  >64  S 

{1  >64  •s)[l]64  ^  1' 

nS32  >32  (nS32  >32  s)[0] 

{w>32  s)[n+  1]64  ^  1' 

*=  (s)[n]64  ^  1' 

{1  >64  ■s)[n.  +  2]64  1' 

*=  (■s)N64  ^ 

R{ 


def 

«?;)  =  < 


R{r) 

^(sp)[i]32 

w 

tagi 
i?(s?;')[c] 
q  R{sv') 

pack  R(sv')  as  r  hiding  c 


when  sv  =  r 
when  sv  =  sp(i) 
when  sv  =  w 
when  sv  =  tagj 
when  sv  =  sv'[(^ 
when  sv  =  q  sv' 

when  sv  =  pack  sv'  as  r  hiding  c 


R{fv) 


R{f)  when  fv  =  f 

i?(sp)[i]64  when  fv  =  sp(i) 


C.2.2  Transitions 
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{H,  R,  I)  1 — >  P  where 

if  1  = 

then  P  = 

mov  r,  sv]  I' 

{H,  R{y^  R{sv)],I') 

loadrrrf,rs(i);/' 

{H,  R{r  Wi},I') 

where  R{rd)  =  £  and  H{i)  =  {wq,  . . .  ,Wn-i) 

store  rrf(i),  sv]  I' 

{H{e  1-^  {wo, . . . ,  R{sv), Wn-l)},R,  I') 
where  R{rd)  =  £  and  H{£)  =  {wq,  . . .  ,Wi, . . .  ,Wn-i) 

mallocr[ri, . . .  ,Tn]{svo,  •  •  • ,  sVn-i)]  I' 

{H{£  1-^  {R{svo), . . . ,  R{svn-i))},  R{rd  ^  £},  I') 
where  1  ^  H 

mallocT  r£;(s?;i,  5^2);  I' 

{H{£  ^  [R{sv2),  . . . ,  i?(st2)]},  R{rd  ^  £},  I') 

' - V - ' 

R(svi) 

where  £  ^  H 

malloc0  Y,fv;  I' 

(H{£^R{fv)},R{vd^  £},!') 
where  £  ^  PI 

malloc0  r{sv,fv)-,  I' 

{H{£  ^  [R{fv), R{fv)]},  R{rd  ^  £},  /') 

' - V - ^ 

R{sv) 

where  £  ^  H 

dyntag^  r;  I' 

{H{£  exn_tag},  i?{r  £},!') 

where  £  ^  H 

swrite  sp(i),  sv;  I' 

{H,  i?{sp  1-^  s'},P) 

where  s'  =  (ii(sp))[i]32  ^  R{sv) 

salloc  n;  I' 

{H,  ii{sp  1-^  nS32  >32  •  •  •  >32  ^532  >32-R(sp) } ,  I') 

n 

sfree  n;  I' 

{H,  i?{sp  s'},I') 
where  pop(i?(sp), n)  =  s' 

mov  r,  sp;  I' 

{H,R{r  ^  sptr(|i?(sp)| )},/') 

mov  sp,  Sri;  I' 

{H,  i?{sp  1-^  s^},  -fO 

where  R{sp)  =  s  and  R{sv)  =  sptr(j)  =  s' 
and  s'  is  a  suffix  of  s 

Figure  C.l:  TILTAL  transitions  (part  I) 
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{H,  R,  I)  1 — >  P  where 

callp  sv]  I' 

{H{1  ^  code[].r./'},  i?{sp  1  >32  i?(sp)},  /^[c/A]) 
where  R{sv)  =  ^^[c]  and  H{£d)  =  code[A]r./(i 
and  £  ^  H 

brtagj  r,sv;P 

when  R{r)  =  tag- 

(i7,ii{r  ^  tagj,/^) 

where  R{sv)  =  i  and  H(i)  =  codeyr./^; 

brtagj  r,sv;P 

when  R{r)  /  tag^ 

{H,R{r  ^  inj.union*^,  ^^^)  tagj,/') 
where  ci  normalizes  to  tq 
and  Tk  =  Tag(i) 

and  c'l  =  To  Tk-i  Tk+i  ■  -  Tn 

brtgdj  r,  sv;  I' 

when  R{r)  =  inj_union^^^ 

and  =  (tagj,r(;) 

where  R{sv)  =  £  and  =  codeyr./^; 

brtgdj  r,  sv]  I' 

when  R{r)  =  tag^ 

{H,R{r  ^  ta.g,^},Id) 

where  C2  normalizes  to  tq 

and  Tk  =  x[Tag(i),T^] 

and  C2  —  tq  ••  *  *  *  ••  Tk—i  ••  ..  *  *  *  Tfi 

brtgdj  r,  sv,  I' 

when  R{r)  =  inj_union^^^  £ 

and  H{^l)  =  (tag^,?/;)  (i  k) 

{H,R{r  ^  inj_union*(^^^^^^  £},Id) 
where  C2  normalizes  to  tq 
and  Tk  =  x[Tag(i),r^] 

and  C2  Tq  ••  *  *  *  ••  Tk—X  ••  ••  *  *  *  Tfi 

ret 

{H,  ii{sp  s'},/[c/A]) 

where  i?(sp)  =  £[c]  >32  s'  and  H{i)  =  code[A]r./ 

jmp  sv 

iH,R,I[c/A]) 

where  R{sv)  =  £[0]  and  H{£)  =  code[A]r./ 

Figure  C.2:  TILTAL  transitions  (part  II) 
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{H,  R,  I)  1 — >  P  where 

unpack  [a,  r],  s?;;  I' 

{H,  R{r  w},  I'[c/a]) 

where  R{sv)  =  pack  w  as  r  hiding  c 

vcase(c,  ai.  dead  sv,  02);  I' 

{H,R,iy/a2]) 

where  c  normalizes  to  in j  2  c' 

vcase(c,  ai,  a2-  dead  sv);  I' 

{H,R,iy/ai]) 

where  c  normalizes  to  inj  ^  c' 

II 

{H,  R,iy,  02/ p,7]) 

where  c  normalizes  to  (ci,  C2) 

(f  old/3)  =  c;  I' 

{H,R,iy/p]) 

where  c  normalizes  to  fold^j.^c^ 

SUbr  r,  sill,  SV2'-,  I' 

{H,  R{r  Wi},I') 

where  R{svi)  =  £  and  R{sv2)  =  i 

and  H{i)  =  [wq,  . . . ,  Wn-i]  and  0  <  i  <  n 

upd^  sill,  SV2,  sv^;  I' 

{H{i  1-^  [t(;o,  .  .  .  ,Wi-i,R{sV3),Wi+i,.  .  .  ,Wn-l]},  R,  I') 

where  R{svi)  =  £  and  R{sv2)  =  i 

and  H{£)  =  [rro,  •  •  • ,  Wn-i]  and  0  <  i  <  n 

fmovf,/?;;  I' 

{H,R{f^R{fv)}) 

f  loadr  f,  r;  I' 

{H,R{{^l},P) 

where  R{r)  =  £  and  H{£)  =  1 

f  store  r,fv,  I' 

[H{£^l},R{v^  £}J') 
where  R{fv)  =  1  and  £  ^  H 

f  swrite  sp(z),/y;  I' 

{H,R{sp  ^  s'},  I') 

where  s'  =  (i?(sp))[i]64  <—  R{fv) 

SUh^^,SVl,SV2]I' 

{H,Ry^k},i') 

where  R{svi)  =  £  and  R{sv2)  =  i 

and  H{£)  =  [^o,  •  •  •  ,ln-i]  and  0  <  i  <  n 

upd^  svi,sv2,fv;r 

{H{£  [lo,  li-l,R{fv),k+i,  In-l]},  R,  I') 

where  R{svi)  =  £  and  R{sv2)  =  i 

and  H{£)  =  [^o,  •  •  •  ,ln-i]  and  0  <  i  <  n 

Figure  C.3:  TILTAL  transitions  (part  III) 
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